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Foreword 



This volume contains the proceedings of the 1998 international symposium on 
static analysis (SAS’98) which was held in Pisa (Italy), on September 14-16, 1998 
and was part of a federated conference with ALP-PLILP’98 and several work- 
shops. SAS’98 is the annual conference and forum for researchers in all aspects of 
static analysis. It follows to SAS’94, SAS’95, SAS’96 and SAS’97 which were held 
respectively in Namur (Belgium), Glasgow (UK), Aachen (Germany) and Paris 
(France), and the international workshops WSA’92 held in Bordeaux (France) 
and WSA’93 held in Padova (Italy). 

In response to the call for papers, 48 papers were submitted. All papers 
were reviewed by at least three reviewers and the program committee met in 
Pisa to select 20 papers based on the referee reports. There was a consensus 
at the meeting that the technical papers were of very high quality. In addition 
to the submitted papers, SAS’98 had a number of outstanding invited speakers. 
Roberto Giacobazzi, Peter Lee, Amir Pnueli, Dave Schmidt, Scott Smolka, and 
Bernhard Steffen accepted our invitation to give invited talks or tutorials. Some 
of the papers (or abstracts) based on these talks are also included in this volume. 

SAS’98 has been fortunate to rely on a number of individuals and organi- 
zations. I want to thank all the program committee members and referees, for 
their hard work in producing the reviews and for such a smooth and enjoy- 
able program committee meeting. Special thanks go to the conference chairman, 
Maurizio Gabbrielli, and to my students in Pisa who helped me a lot. More spe- 
cial thanks go to Vladimiro Sassone, who made available to SAS’98 his excellent 
system for handling submissions and reviews on the web, and to Ernesto Lastres 
and Rene Moreno who were my “system managers” . The use of this system made 
my life of program chairman much easier and I strongly recommend it to future 
program chairpersons. 

SAS’98 was sponsored by Universita di Pisa, Gompulog Network, Gonsiglio 
Nazionale delle Ricerche, GNR-Gruppo Nazionale di Informatica Matematica, 
Gomune di Pisa, and Unione Industriali di Pisa. 
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Bidirectional Data Flow Analysis in Code 
Motion: Myth and Reality 



Oliver Riithing 

Department of Computer Science, University of Dortmund, Germany 
ruething@ls5 . cs . uni-dortmund . de 



Abstract. Bidirectional data flow analysis has become the standard 
technique for solving bit-vector based code motion problems in the pres- 
ence of critical edges. Unfortunately, bidirectional analyses have turned 
out to be conceptually and computationally harder than their unidi- 
rectional counterparts. In this paper we show that code motion in the 
presence of critical edges can be achieved without bidirectional data flow 
analyses. This is demonstrated by means of an adaption of our algorithm 
for lazy code motion ini, which is developed from a fresh, specification 
oriented view. Besides revealing a better conceptual understanding of the 
phenomena caused by critical edges, this also settles the foundation for a 
new and efficient hybrid iteration strategy that intermixes conventional 
round-robin iteration with the exhaustive iteration on critical subparts. 



1 Motivation 

In data flow analysis equation systems involving bidirectional dependencies, i. e. 
dependencies from predecessor nodes as well as from successor nodes, are a 
well-known source for various kinds of difficulties. First, bidirectional equation 
systems are conceptually hard to understand. Mainly, this is caused by the lack 
of a corresponding operational specification like it is given by the the meet over 
all path (MOP) solution of a uni-directional data flow problem. Furthermore, 
Khedker and Dhamdhere recently proved that the costs for solving bidirectional 
data flow analysis problems may be significantly worse than for solving their uni- 
directional counterparts. This particularly holds for the only practically relevant 
class of bidirectional analyses, bit- vector based code motion problems. In fact, 
all known bidirectional problems are of this kind. Even more specifically, they 
are more or less variations of Morel’s and Renvoise’s pioneering algorithm for 
the elimination of partial redundancies Independently dif- 

ferent researchers documented that bidirectionality is only required in programs 
that have critical edges iaini,i.e. edges in a flow graph that directly lead from 
branch nodes to join nodes (see Fig. [IJi for illustration). Ideally, critical edges 
can be completely eliminated by inserting empty synthetic nodes as depicted 
in Fig. [3). In this example, the additional placement point enables the code 
motion transformation shown in Fig. dJ: which eliminates the partial redundant 

G. Levi (Ed.): SAS’98, LNCS 1503, pp. 1-||3 1998. 
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Fig. 1. a) Critical Edge b) Edge splitting c) Transformational gain through edge 
splitting 



computation on the path through node 1 and 30 However, in practice splitting 
of critical edges is sometimes avoided since it may cause additional unconditional 
jumps or decrease potential for pipelined execution0 

In this paper we investigate a new approach to code motion in the presence of 
critical edges. This is demonstrated by presenting a “critical” variant of our al- 
gorithm for lazy code motion US!- However, the principal ideas straightforwardly 
carry over to all related code motion algorithms that employ bidirectional data 
flow analyses. 

Our algorithm is developed from a rigorous, specification oriented view. This 
particularly allows us to separate between different concerns. While safety in 
code motion is naturally associated with forward and backward oriented propa- 
gation of information, the presence of critical edges requires to impose additional 
homogeneity properties which can be expressed in terms of a side propagation 
of information. Actually, this clear separation allows us to avoid the usage of 
bidirectional dependencies in our specification. With regard to the variant of 
lazy code motion the contribution of this paper is threefold: 



— On a conceptual level we give a unidirectional specification of the problem. 
This particularly induces the first MOP characterization of code motion in 
the presence of critical edges. 

— We present a novel hybrid iteration strategy that separates the informa- 
tion flow along critical edges from the information flow along the uncritical 
ones. While the latter one is accomplished by an outer schedule proceeding 
in standard round-robin discipline the critical information flow is treated 
exhaustively by an inner schedule. 

— Almost as a by-product we obtain the first lifetime optimal algorithm for 
partial redundancy elimination in the presence of critical edges. 



^ This is not possible in Fig. since hoisting a + 6 to node 2 introduces a new value 
on the rightmost path. 

^ Sometimes critical edges are not split only in situations that may harm the final 
code generation. 
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1.1 Related Work 

As Khedker and Dhamdhere HH and more recently Masticola et al. HH noticed, 
critical edges do not add to the worst-case time complexity of iterative data 
flow analyses being based on a workset approach. However, this result cannot be 
generalized to bit-vector analyses where the iteration order has to be organized 
in a way such that structural properties of the flow graph are exploited in order 
to take maximum benefit of bit-wise parallel updates through efficient bit- vector 
operations. 

Hecht and Ullman m proved an upper bound on the number of round- 
robin iterations that are necessary for stablization of monotone, unidirectional 
bit-vector problems. When proceeding in reverse postorder (or postorder for 
backward problems) d-t-2round-robin iterations are sufficient where dis the depth 
of the flow graph, i. e. the maximum number of backedges on an acyclic program 
path. 

Recently, Dhamdhere and Khedker ISO generalized this result towards bidi- 
rectional problems. However, a major drawback of their setting is that it is pinned 
to round-robin iterations. Unfortunately, such a schedule does not fit well to sit- 
uations where information is side-propagated along critical edges. In this light, 
it is not astonishing that their results on the convergence speed of bidirectional 
bit-vector analyses are quite disappointing. They replace the depth dof a flow 
graph by its width w, which is the number of non-conform edge traversals on an 
information flow pathlH Unfortunately, the width is not a structural property 
of the flow graph, but varies with the problem under consideration, and unlike 
dwhich is 0 for acyclic programs is not even bounded in this case. Actually, 
the notion of width does not match to the intuition associated with the name, 
as even “slim” programs may have a large width. An intuitive reason for this 
behaviour is given in Fig.|2^ which shows a program fragment with a number of 
critical edges. Let us consider that information flow in this example follows the 
equation 



Info{n) 



Info{m) + Info{n') 

m^pred{n) n'Gswcc(m) 



which means that the information at node nis set to true if the information 
at a predecessor or the information of any “sibling” of nis true. 

We can easily see that the width of a flow graph with such a fragment directly 
depends on the number of critical edges, and therefore possibly grows linearly 
with the “length” of the program. It should be noted that such programs are by 

® Informatively, an information flow path is a sequence of backwards or forwards di- 
rected edges along which a change of information can be propagated. A forward 
traversal along a forward edge or a backward traversal along a backward edge are 
conform with a round-robin schedule proceeding (forwards) in reverse postorder. 
The other two kind of traversals are non-conform. Complemental notions apply to 
round-robin iterations proceeding in postorder. 
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Fig. 2. a) Acyclic program path responsible for the width of a program with reverse 
postordering of the nodes b) Slow information propagation in round-robin iterations 

no means pathological and thus the linear growth of the width is not unlikely for 
real-life programs. In fact, considering the reverse postorder of nodes as given in 
Fig. Et the large width is actually reflected in a poor behaviour of a round-robin 
iteration. Fig.Eb shows how the information slowly propagates along the obvious 
“path” displayed in this example being stopped in each round-robin iteration at 
a non-conform (critical) edgeo 

Dhamdhere and Patil proposed an elimination method for bidirectional 
problems that is as efficient as in the unidirectional case. However, it is restricted 
to a quite pathological class of problems, namely weakly bidirectional bit- vector 
problems and, as usual for elimination methods, is mainly designed for reducible 
control flow. 

Finally, our hybrid approach shares with the hybrid iteration strategy of m 
that it mixes a round-robin schedule with exhaustive subiterations. However, 
their subiterations are within strongly connected components and their approach 
is solely designed to speed up unidirectional iterations. 



2 Preliminaries 

We consider programs in terms of directed flow graphs = {N, E,s,e) with node 
set N, edge set E and unique start and end nodes s and e, respectively. Nodes 
n,m, . . . G iVrepresent (elementary) statements and are assumed to lie on a path 
from sto e. Finally, predecessors and successors of a node n G TV are denoted by 
pred(n)and succ(n), respectively, and P[n,m] stands for the set of finite paths 
between node n and m. 



Shaded circles indicate the flow of informations along the “path” . 
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Local Predicates. As usual our reasoning is based on an arbitrary but fixed 
expression (p that is the running object for code movement. With each node of 
the flow graph two local predicates are associated. 

Comp{n): ip is computed at n, i.e. is part of the right-hand side expression 
associated with n. 

Transp(n): nis transparent for p, i. e. none of p’’s variables is modified at n. 



Global Predicates. Based on these local predicates global program properties 
are specified. Usually, global predicates are associated with both entries and 
exits of nodes. In order to keep the presentation simple we assume that every 
node is split into an entry node and an empty exit node which inherit the set 
of predecessors and successors from the original node, respectively, and which 
are assumed to be connected by an edge leading from the entry node to the exit 
node. This step allows to restrict our reasoning to entry predicates. It should be 
noted, however, that this step is solely conceptual and does not eliminate any 
critical edge. 

In this paper partial redundancy elimination (PRE), or code motion (CM) 
as a synonym, stands for program transformations that 

1. insert some instances of initialisation statements := p at program points, 
where is a temporary variable that is exclusively assigned to p and 

2. replaces some original occurrences oi p hy & usage of h,^. 

In order to guarantee that the semantics of the argument program is preserved, 
we require that a code motion transformation must be admissible. Intuitively, 
this means that every insertion of a computation is safe, i.e. on no program path 
the computation of a new value is introduced at initialization sites, and that 
every substitution of an original occurrence oiphy b^is correct, i.e. always 
represents the same value as p at use sites. This requires that h,^is properly 
initialized on every program path leading to some use site in a way such that no 
modification occurs afterwards 0 

3 Code Motion in the Absence of Critical Edges 

Before presenting our new approach to PRE in the presence of critical edges 
we shall first briefly recall the basic steps of lazy code motion PS] as a typical 
representative of an algorithm that relies on the absence of critical edges. Lazy 
code motion was the first algorithm for partial redundancy elimination that suc- 
ceeded in removing partial redundancies as good as possible, while avoiding any 
unnecessary register pressure. This was mainly achieved by a rigorous redesign of 
Morel’s and Renvoise’s algorithm. Starting from a specification oriented view the 
key points was a hierarchical separation between the primary and the secondary 
concern of partial redundancy elimination, namely to minimize the number of 
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computations and to avoid unnecessary register pressure, respectively. This hi- 
erarchical organization is reflected in a two-step design of the algorithm: lazy 
code motion rests on busy code motion. In following we briefly summarize the 
details of these transformations. 



3.1 Busy Code Motion 



Busy code motion (BCM) |1 511 places initializations as early as possible while 
replacing all original occurrences of ip. This is achieved by determining the ear- 
liest program points, where an initialization is safe. Technically, the range of 
safe program points can be determined by separately computing down-safe and 
up-safe program points. Both are given through the greatest solutions of two 
uni-directional data flow analyses, respectively[j 



DnSafe{n) = 

UpSafe{n) = 

Safe (n) 
Earliest {n) 



{n e) ■ {Comp{n) Transp{n) ■ DnSafe{m)) 

m^succ{n) 

(n s) • Transp{m) ■ {Comp{m) UpSafe{m)) 

m^pred{n) 

UpSafe(n) DnSafe{n) 

Safe{n) ■ {{n = s) Safe{m)) 

m^pred{n) 



Despite of its surprising simplicity, BCM already reaches computational opti- 
mality, i.e. programs resulting from this transformation have at most as many 
(/^-occurrences on every path from sto eas any other result of an admissible code 
motion transformation (cp. nscni). 

3.2 Lazy Code Motion 

In addition to BCM, lazy code motion (LCM) takes the lifetimes of temporaries 
into account. This is accomplished by placing initialisations as late as possible 
but as early as necessary, where the latter requirement means “necessary in order 
to reach computational optimality” . Technically, this is achieved by determining 
the latest program points where a BCM-initialisation might be delayed to, which 
leads to one additional uni-directional data flow analysisQ 

® As common and overlining stand for logical conjunction, disjunction and 

negation, respectively. 

^ In |1 .‘ill (ij an additional analysis is employed determining isolated program points, 
i. e. program points where initialisations are only used immediately afterwards. This 
aspects, however, can independently be treated by means of a postprocess. For the 
sake of simplicity we skip the isolation analysis in this paper. 
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Delayed (n) = 
Latest (n) 



Earliest {n) + (n s) • Delayed {m) ■ Comp{m) 

m^pred{n) 

Delayed (n) ■{ Comp (n) + Delayed (m)) 

m^succ{n) 



LCM is computationally optimal as well as lifetime optimal, i.e. the temporary 
associated with (/?has a lifetime range that is included in the lifetime range of 
any other program resulting from a computationally optimal code motion trans- 
formation (cp. nscni). The lifetime range of a temporary comprises all nodes 
whose exits occur in between an initialisation site and a use site such that no 
other initialisations are situated in between H 

4 Code Motion in the Presence of Critical Edges 

In this section our new approach to PRE in the presence of critical edges is 
elaborated in full details. First we shall investigate the principal differences to the 
setting presented in Sect. I.S.21 As opposed to flow graphs without critical edges 
there are usually no computationally optimal representatives. In fact, Fig. 0 
shows two admissible, but computationally incomparable transformations that 
cannot be improved any further. The first one is simply given by the identical 
transformation of the program in Fig. 0i, the result of the second one is displayed 
in Fig. 03. Each of the resulting programs has exactly one computation on the 
path that is emphasised in the dark shade of grey, while having two computations 
on the path being emphasised in the light shade of grey, respectively. Thus there 
is no computationally optimal code motion transformation with respect to the 
original program in Fig. 0i. 




Fig. 3. a & b) Incomparable admissible program transformations c) Program degra- 
dation through a naive adaption of busy expression motion 

® In we show that this optimality result is only adequate for flat universes of ex- 
pressions. If both composite expressions and their subexpressions are moved, then 
the notion of lifetime optimality changes and a significantly more sophisticated tech- 
nique has to be applied. Nonetheless, LCM still provides a basic ingredient of this 
approach. 
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This problem can be overcome by restricting the range of program transfor- 
mations to those that are profitable, which means those that actually improve 
their argument programs. Note that this requirement excludes Fig. Ob as a rea- 
sonable code motion transformation. Obviously, profitability does not provide a 
further restriction for flow graphs without critical edges, where computationally 
optimal code motion transformations are granted to exist. In the presence of 
critical edges, however, this additional constraint is necessary in order to yield 
computationally optimal results at all. 



4.1 Busy Code Motion 

In this section we will develop a counterpart to BCM in the presence of criti- 
cal edges. After briefly sketching the difficulties that prohibit a straightforward 
adaption of the uncritical solution, a correct approach is systematically devel- 
oped from a specification that incorporates the special role of critical edges. 

Unfortunately, BCM as presented in Sect. rTTI ca.nnot straightforwardly be 
applied to flow graphs with critical edges. This is because such a naive adaption 
may include non-profitable transformations as it is illustrated in Fig. 0c, where 
the marked range of down-safe program points would yield earliest initialisation 
points at nodes 1, 2 and 60 



Homogeneous Propagation of Down- Safety. The key for a useful critical 
variant of BCM is to impose an additional homogeneity requirement on down- 
safety that ensures that the information propagates either to all or to none of its 
predecessors, which grants that earliest program points become a proper upper 
borderline of the region of safe program points. In fact, in the absence of critical 
edges down-safety has the following homogeneity property: 

V n £ A. DnSafefn) => (V m £ pred(n). Safe(m) V V m £ pred{n). ^DnSafeirn)) 

Note that the first term of the disjunction uses safety rather than down-safety, 
since propagation of down-safety needs not to be considered for predecessors 
that are up-safe anyhow0 Now this propery has to be forced explicitly. For 
instance, in Fig.Efc node 6 as well as node 3 are down-safe, while node 4 is not. 
Therefore, let us consider the following notion of homogeneous down-safety: 

Definition 1 (Homogeneous Down-Safety). A predicate HDnSafe on the 
nodes of N is a homogeneous down-safety predicate iff for any n € N 

1. HDnSafe is conform with down-safety: 

HDnSafe (n) => (n 7^ e) A {Comp{n) V Transp{n) A V m £ succ(n). HDnSafe (n)) 

® Modifying this example by removing the computation of a -|- fofrom node 1, wonld 
even result in a transformation that does not not improve any path while strictly 
impairing some. 

In the absence of critical edges this makes no difference to V m £ predfn). DnSafe{m). 
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2. HDnSafe is homogeneous: 

HDnSafe{n) => (ym£pred{n). {HDnSafe (m) V UpSafe{m)) V 
V m £ pred{n). -iHDnSafe{m)) 



Obviously, homogeneous down-safety predicates are closed under “union” O 
Thus there exists a unique largest homogeneous down-safety predicate DnSafe^^^, 
which gives rise to a homogeneous version of safety, too: 

Vn€ N. Safe^„^{n) DnSafe^^^{n) V UpSafe{n) 

It should be noted that this definition is developed from a pure specification 
oriented reasoning and can be seen as a first rigorous characterization of down- 
safety in the presence of critical edges: down safety is described by a backward 
directed data flow problem which is restricted by additional homogeneity con- 
straints. This is in contrast to other algorithms, where bidirectional equation 
systems are postulated in an ad-hoc fashion without any separation of their 
functional components. 

Earliest program points are defined as in the uncritical case, but with the 
difference of using the homogeneous version of down-safety in place of the usual 
one. 



EarliestEom{n) DnSafe^„^{n) ■ {{n s) + ^ Safef^^^im)) 

m^pred{n) 



The earliest program points serve as insertion points of BCM for flow graphs 
with critical edges (CBCM). With a similar argumentation as for BCM is easy to 
prove that CBCM is indeed computationally optimal, however, only relatively to 
the profitable transformations. 



Computing CBCM: The Data Flow Analyses. In this part we present how the 
specifying solution of CBCMcan be translated into appropriate data flow analyses 
determining the range of homogeneously safe program points. We will discuss 
three alternative approaches: (1) A “classical” one via bidirectional analyses, (2) 
a new non-standard approach that transforms the problem into one with purely 
unidirectional equations and (3) a hybrid approach that separates backwards 
flow from side propagation. 



The Bidirectional Approach The specification of Definition ^ can straightfor- 
wardly be transfered into a bidirectional equation system for down-system. 



This means the predicate defined by the pointwise conjunction of the predicate 
values. 
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UpSafe{n) = (n / s) • Transp{m) ■ {Comp{m) + UpSafe{m)) 

m^pred(^n) 

DnSafe^^^{n) — (n 7^ e) ■ {Comp{n) + Transp{n) ■ 

n DnSafe^^,{m) ■ {UpSafe{n) + DnSafe^^^{n))) 

m^succ(ri) n' ^pred{m) 

Safe^o.{n) = UpSafe{n) + DnSafe^^^{n) 



Unfortunately, the above bidirectional data flow problem shares the problems 
sketched in Fig. ^when subjected to a round-robin iteration strategy. In fact, 
violation of homogeneous safety follows exactly the same definition pattern as 
Info does in this example 0 Hence slow propagation of down-safety would be 
also apparent in CBCM. 

The Unidirectional Approach It is easy to see that in the bidirectional equation 
system there is no “true” forward propagation of down-safety information as 
the scope of the term Y[n'epred(m)i^P^°‘M^') + DnSafe^^^{n'))\s restricted in 
its context. Rather this can be seen as a “side propagation” of down-safety 
information along zig-zag paths. For a technical description let us define the 
set of zig-zag successors zsucc{n)of a node n € N as the smallest set of nodes 
satisfying (see Fig. 0 for illustration): 

1. succ{n) C zsucc{n) 

2. Vm G zsucc{n). succ{pred{m)) C zsucc{n) 

In our example zig-zag propgation of non-down-safety is further stopped at nodes 
where up-safety can be established. Hence we introduce a parameterized notion 
of zsucc{n) which is defined for M C iVby: 

1. succ{n) C zsuccM{n) 

2. Vm G zsuccM{n). succ{pred{m) \ M) C zsuccM{n) 

With XUS {m G N \ UpSafc{m)}the equation for down-safety can be 

rewritten as: 



DnSafejj^^{n) = n 7^ e ■ {Comp{n) Transp{n) ■ n DnSafe^^^{m)) 

mEzsuccxus (^) 



Note that this equation system can be seen as a unidirectional one that operates 
on a flow graph that is enriched by shortcut edges drawn between nodes and 
their zig-zag successors (see Fig. Eb)- 
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Fig. 4. (a) Program fragment with a nest of critical edges (b) Zig-zag successors and 
virtual shortcut edges of node 1 



However, we do not actually recommend to perform such a transformation, as 
this would require to introduce an unnecessarily large number of additional 
edges. For a zig-zag chain of k critical edges as shown in Fig. Hi the number 
of shortcut edges is of order Although, long zig-zag chains of critical edges 
can be expected to be rare in practice, we will show that information propagation 
can be organized without such blow-up in the number of edges. However, the 
important contribution of the unidirectional approach is that it provides the first 
meet over all paths characterization of PRE in the presence of critical edges 0 



The Hybrid Approach: The hybrid approach rather addresses the organization 
of the iteration process than the equation system itself. As we have learned, 
bidirectional problems like our formulation of homogeneous down-safety do not 
fit together with a round-robin schedule based upon postorder traversals. The 
hybrid approach modifies the conventional round-robin schedule by integrating 
zig-zag propagation of information. This is achieved by clustering the nodes in a 
flow graph in a way such that side propagation of information can take benefit 
of much potential for simultaneous work. The overall schedule of the approach 
can be sketched as follows: 



Preprocess: Collapsing of nodes according to side flow of information 
Outer Schedule: Process the collapsed nodes in postorder until stabilization 
is reached performing an 

Inner Schedule 

1. For each node within the collapsed one perform information propa- 
gation along its outgoing uncritical edges 

2. Perform exhaustive information propagation along the outgoing crit- 
ical edges within the collapsed node 

In the following we will go into the details of this process. 

• The preprocess: Clustering of nodes groups together nodes of iV according to 
the following equivalence relation: 

n = zsucc{n) = zsucc{m) 

Actually, only the notion of paths has to be extended towards paths across shortcut 
edges. 
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It should be noted that Gcan be decomposed into its equivalence classes easily 
by tracing zig-zag paths of critical edges originating at an unprocessed node. 
For instance, starting with node 1 in Fig. we obtain the equivalence class 
{l,2,3}by following the critical edges. Clearly, this process can be managed in 
order G(e)where edenotes the number of edges in E. All nodes of an equivalence 
classes are collapsed into a single node that inherits all incoming and outgoing 
edges of its members (see Fig. 0 for illustration). 




Fig. 5. (a) Equivalent nodes (b) Collapsing equivalent nodes 

• The outer schedule: The flow graph G'that results from the collapsing pre- 
process is used in order to determine the round-robin schedule which drives 
information backwards. It should be noted that the depth of G'may differ from 
the depth of the original flow graph Gin both directions: the depth may increase 
or decrease by collapsing. This is illustrated in Fig. 0 While collapsing nodes 
in Part a) decreases the depth, since the indicated path is no longer acyclic, 
collapsing in Part b) allows to construct a longer acyclic path as indicated by 
the dashed line connecting two independent acyclic paths. 





Fig. 6. (a) Decrease of depth due to collapsing of nodes (b) Increase of depth due to 
collapsing of nodes 

• The inner schedule: The first step of the inner schedule is quite trivial. Con- 
sidering a node n G fVwithin the collapsed node under consideration and an 
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uncritical edge {n,m) G Ethe value of DnSafeJ^^^{n)is changed to false if and 
only if Transp{n) ■ DnSafej^^^{m) holds 

The second step of the inner schedule is the central innovation in our approach 
and has to be elaborated with some care. Within any collapsed node side prop- 
agation of down-safety information along critical edges is done exhaustively by 
using a subiteration process. Information propagation here means that for nodes 
n, min the collapsed node under consideration with m G pred{succ{n))the value 
of DnSafef^^^{m)is changed to false if and only if DnSafef^^^{n) ■ Up Saf e{n)holds. 
The crucial point, however, is to organize the information flow along the critical 
edges. The situation is easy if the zig-zag paths are acyclically shaped as dis- 
played in Fig. |3 l or Fig. 03. In this case the equivalence class can be represented 
as a tree, which can already be built while preprocessing this class. Following 
the topological order of the tree, information can be propagated completely by 
a bottom-up traversal (from the leaves to the root) followed by a top-down 
traversal (from the root to the leaves). 

Unfortunately, in general there may be cycles of critical edges as shown in 
Fig.Ct and Fig. 01. Hence a problem of the same difficulty as in the backward 
propagation of information shows up in the side-propagation step. However, 
separating both problems is useful as we expect nested cycles of critical edges to 
be a phenomenon that is extremely rare in practice. Nonetheless, to cope with 
them is quite straightforward. As in the acyclic case, the equivalence class can 
be represented as a tree with some additional non-tree edges establishing cycles. 
The only difference to the non-cyclic case is that the tree traversals have to be 
iterated more than once until the process gets stable. To estimate the number 
of traversal we borrow the arguments from conventional unidirectional analysis. 
Denoting the non-tree edges within the tree-like representation of an equivalence 
class as critical hackedges the number of iterations is bound by dc, where dcis the 
maximum number of critical back edges along an acyclic path in any component 
representation. 

Complexity of the Hybrid Approach: All together the iteration of homogeneous 
down-safety in the hybrid approach requires to apply the outer schedule until 
stabilization. Since the inner schedule propagates the information completely 
within each collapsed node the overall effort can be estimated by 

(d^ -|- 2)(e„ -|- 2(dc -|- 2 )gc) 

bit-vector steps, where e.^and e^denote the number of uncritical and critical 
edges, respectively, d’ is the depth of the collapsed flow graph G'and dcthe 
critical depth as defined before. 

It is commonly argued that the depth of a flow graph is a reasonably small 
constant in practice. We already discussed that dcis at least as likely to be a 
small constant, too. Hence the algorithm is expected to behave linear in efor 
real-life programs. In particular, we succeed in giving the first linear worst-case 
estimation for acyclic programs as in our introductory example of Fig. 0 

Note that there are no uncritical edges directly connecting different nodes of an 

equivalence class. 
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Fig. 7. Shapes of equivalent nodes: (a) chain of critical edges, (b) tree of critical edges, 
(c) cycle of critical edges and (d) structure with nested cycles of critical edges 

4.2 Lazy Code Motion 

Similar to the situation in Sect. IQ also the relevant analyses of LCM as defined 
in Sect. Id. 2l cannot naively be adapted to flow graphs with critical edges. Again 
the reason for this behaviour lies in a homogeneity defect, but now with respect 
to delayability. In fact, for flow graphs without critical edges we have 

Delayed (n) => (Vm G succ{n). Delayed (m) V V m G succ{n). -'Delayed {m) ) 

This property may now be violated. Hence one has to force homogeneity ex- 
plicitly in order to yield an appropriate critical variant of lazy code motion. 
Therefore, let us consider the following notion of homogeneous delayability. 

Definition 2 (Homogeneous Delayability) . A predicate HDelayed on N is 
a homogeneous delayability predicate iff for any n € N 

1. HDelayed{n)is conform with delayability: 

HDelayed [n) => 

Earliestnom{n) V ((n 7^ s) A V m G pred(n). HDelayed(m) A -'Comp(m)) 

2. HDelayed {n) is homogeneous: 

Delay ed(n) ^ (Vm G succ(n). HDelayed fm) V V m G succ{n). -iHDelayed{m)) 

Obviously, homogeneous delayability predicates are closed under “union” . Thus 
there exists a unique largest homogeneous delayability predicate Delayed^^^. This 
gives rise to a new version of latestness characterizing the insertion points of lazy 
code motion for flow graphs with critical edges (CLCM). 

Latestiiom{n) ^ Delayedf^^ffn) A (Comp{n) V 3m G succ{n). -•Delay ed^^ffm)) 
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Using the same definitions for lifetime optimality as in Sect. E3 we succeed in 
proving lifetime optimality of CLCM. 



Computing CLCM. In analogy to Sect. ^Dthe delayability property can either 
be coded into a bidirectional equation system or, more interestingly, again be 
expressed using a unidirectional formulation: 



Delayed^^Jji) = Earliestnomin) V 

((n yf s) A V m € zpred{n). Delayed^^Jjn) A ->Comp{m)) 



This definition is based on zig-zag predecessor, which are defined completely 
along the lines of zig-zag successors. However, in contrast to down-safety zpredneeds 
not to be parameterized this time. Using this characterization the same tech- 
niques for hybrid iteration can be used as in Sect. im 

5 Conclusion 

We presented an adaption of lazy code motion to flow graphs with critical edges 
as a model how to cope with bidirectional dependencies in code motion. On the 
conceptual level we isolated homogeneity requirements as the source for bidirec- 
tional dependencies. This led to a new hybrid iteration strategy which is almost 
as fast as its unidirectional counterparts. This dramatically improves all known 
estimations for bidirectional bit-vector methods. Nonetheless, we still recom- 
mended to eliminate critical edges as far as possible, since critical edges are also 
responsible for problems of a different flavour m- However, any implementation 
of code motion that has to cope with critical edges will definitely benefit from 
the ideas presented in this paper. 
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Abstract. The bytecode verifier of the Java Virtual Machine, which 
statically checks the type safety of Java bytecode, is the basis of the se- 
curity model of Java and guarantees the safety of mobile code sent from 
an untrusted remote host. However, the type system for Java bytecode 
has some technical problems, one of which is in the handling of sub- 
routines. Based on the work of Stata and Abadi and that of Qian, this 
paper presents yet another type system for Java Virtual Machine sub- 
routines. Our type system includes types of the form last (a;). A value 
whose type is last (a:) is the same as that of the a;-th variable of the caller 
of the subroutine. In addition, we represent the type of a return address 
by the form return(n), which means returning to the n-th outer caller. 
By virtue of these types, we can analyze instructions purely in terms 
of type, and as a result the correctness proof of bytecode verification 
becomes extremely simple. Moreover, for some programs, our method 
is more powerful than existing ones. In particular, our method has no 
restrictions on the number of entries and exits of a subroutine. 



1 Introduction 

One contribution of Java is its bytecode verifier, which statically checks the 
type safety of bytecode for the JVM (Java Virtual Machine) prior to execution. 
Thanks to the bytecode verifier, bytecode sent from an untrusted remote host can 
be executed without the danger of causing type errors and destroying the entire 
security model of Java, even when the source code is not available. Verifying 
the type safety of bytecode (or native code) seems to be a new research area 
that is not only of technical interest but also of practical importance, due to the 
availability of remote binary code in web browsers and other applications. 

Much effort has been put into guaranteeing the security of Java programs, 
including the type safety of bytecode: 

— Security model for Java applets: The security model for Java applets is said 
to consist of three prongs: the bytecode verifier, the applet class loader and 
the security manager jOj. In this model, the bytecode verifier plays the most 
fundamental role, on which the other two prongs are based. If the bytecode 
verifier is cheated, the other two also become ineffective. 

G. Levi (Ed.): SAS’98, LNCS 1503, pp. 17-1321 1998. 

(c) Springer- Verlag Berlin Heidelberg 1998 
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— Type safety of source code: The type safety of Java programs has been 
proved, either formally (using a theorem proving assistant), or rigorously 
(but not formally) |JI4I17I1^ . This means that a program that has passed 
static type checking is guaranteed to cause no type errors while it is running. 

— Safety of class loading: Java allows classes to be loaded lazily, i.e., only when 
classes are actually accessed. In order to support various loading disciplines, 
Java allows programmers to define their own class loaders. This has opened 
one of the security holes in Java M- To avoid such a security hole. Dean 
formalized part of the class loader functionality and formally proved its cor- 
rectness using PVS 0. Goldberg’s main concern is also class loading, though 
bytecode verification is addressed 0. 

— Type safety of bytecode: The bytecode verifier statically checks the type 
safety of Java bytecode. If the bytecode verifier accepts incorrectly typed 
bytecode, it will break the entire security model of Java. It guarantees that 
no type error occurs at each instruction by performing dataflow analysis on 
the bytecode. 



Besides the researches into different aspects of security mentioned above, there 
are also some on-going projects that are developing more secure network pro- 
gramming environments m. 

This paper concerns bytecode verification. Since this is the basis of the entire 
security model of Java, it is desirable to rigorously prove that any bytecode 
program that has passed bytecode verification will never cause a runtime type 
error. 

In order to be able to show the correctness of the bytecode verifier, one has 
to begin by formally specifying the operational semantics of the virtual machine 
(e.g., PP), and then give the formal specification of the bytecode verifier, based 
on its informal specification written in English |^. 

Qian rigorously defined the operational semantics of a subset of the JVM 
and formulated the bytecode verifier as a type system uni He then succeeded 
in proving the correctness of the bytecode verifier, though not completely. 

Bytecode verification of the JVM has some technical challenges. One is that 
of handling object initialization, as objects created but not yet initialized may 
open a security hole. In Qian’s work, much attention is paid to the handling of 
object initialization. 

Another is that of handling the polymorphism of subroutines. This paper 
addresses this issue. Inside a JVM subroutine, which is the result of compiling a 
finally clause in a try statement, local variables may have values of different 
types, depending on the caller of the subroutine. This is a kind of polymorphism. 

To investigate how to analyze JVM subroutines, Stata and Abadi defined a 
type system for a small subset of the JVM and proved its correctness with respect 
to the operational semantics of the subset m- Qian’s system is similar to that 
of Stata and Abadi in its handling of subroutines m- Both systems faithfully 
follow the specification of the bytecode verifier, and make use of information as 
to which variables are accessed or modified in a subroutine. Those variables that 
are not accessed or modified are simply ignored during analysis of the subroutine. 
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This paper takes a different approach. We introduce types of the form last(x). 
A value whose type is last(x) is the same as that of the x-th variable of the 
caller of the subroutine. In addition, we represent the type of a return address 
by the form return(n), which means returning to the n-th outer caller. 

Our approach has the following advantages. 

— By virtue of the last and return types, we can analyze instructions purely 
in terms of types. As a result, the proof of the correctness of bytecode ver- 
ification becomes extremely simple, and we do not need a separate analysis 
on variable access or modification. 

— For some programs (unfortunately, not those produced by the Java com- 
piler), our method is more powerful than existing ones. In particular, our 
method has no restrictions on the number of entries and exits of a sub- 
routine. Stata and Abadi enforce a stack-like behavior on subroutine calls 
with their analysis, which does not account for out-of-order returns from 
subroutines, though they are actually produced by the Java compiler. 

Due to these advantages, we hope that our method can be modified and applied 
to the analysis of other kinds of bytecode or native code Hum. 

This paper is organized as follows: In the next section, we explain JVM 
subroutines in more detail and our approach to bytecode verification. In Sect. 3, 
a subset of the JVM, similar to that of Stata and Abadi, is defined. In Sect. 4, 
our method for analysis of bytecode is described and its correctness is shown. In 
Sect. 5, issues of implementation are briefly discussed. Section 6 offers concluding 
remarks. 

2 Analysis of Subroutines 

The JVM is a classical virtual machine consisting of 

— a program counter, 

— an array for storing the values of local variables, 

— a stack for placing arguments and results of operators, called the operand 
stack, 

— a heap for storing method code and object bodies, and 

— a stack for frames, each of which is allocated for each invocation of a method 
and consists of the program counter, the local variables, and the operand 
stack. 

To allow for checking the type safety of bytecode, it prepares different instruc- 
tions for the same operation depending on the type of the operands. For example, 
it has the instruction i store for storing integers and the instruction f store for 
storing floating-point numbers. 

JVM subroutines are used mainly for compiling the finally clauses of try 
statements of Java. Notice that subroutine calls are completely different from 
method calls. A subroutine is locally defined inside a method and, unlike a 
method, is not allowed to call itself recursively. 
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In this paper, we define a virtual machine based on JVMLO, which was for- 
mulated by State and Abadi for investigating how to analyze JVM subroutines. 
Subroutines are called by instructions of the following form. 

jsr(A) 

An instruction of this form pushes the address of its next instruction (i.e., the 
return address) onto the operand stack and jumps to the subroutine L. Subrou- 
tines are usually defined as follows. 

L : store(a;) 
ret(a:) 

store(x) pops a value from the operand stack and stores it in the x-th local 
variable, ret(x) is an instruction for jumping to the address stored in the x-th 
local variable. Note that, in contrast to the JVM, our virtual machine has only 
one instruction for the store operation. 

Subroutines in the JVM have made bytecode verification more difficult for 
the following reasons. 

— The return address of ret(x) can only be determined after the values of the 
local variables have been analyzed by the bytecode verifier. On the other 
hand, return addresses affect the control flow and the analysis of local vari- 
ables. 

— In some situations, a subroutine does not return to the immediate caller, but 
returns to an outer caller, such as the caller of the caller. 

— Inside a subroutine, local variables may have values of different types, de- 
pending on the caller of the subroutine. 

In this paper, we introduce types of the form last(x) in order to address the 
last problem. A value having this type must have the same value as that of the 
x-th local variable in the caller of a subroutine. 

As an example, let us consider the following program. 

constO 7 : store(O) 

store(l) load(l) 

2 : jsr(7) store(2) 

constNULL ret(O) 

store(l) 

5 : jsr(7) 
halt 

Subroutine 7 is called from two callers (2 and 5). The return address is stored 
in variable 0 (the 0-th local variable). The value of variable 1 is an integer 
when the subroutine is called from caller 2, and is an object pointer when called 
from caller 5. This is a typical case in which a subroutine is polymorphic. In this 
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program, the value of variable 1 is copied to variable 2. As the JVM has different 
store instructions depending on the type of the operand, the above program is 
impossible under the JVM. However, even if the JVM allowed the untyped store 
instruction, the bytecode verifier of the JVM would signal an error for the above 
program, because it always assigns a unique type for any variable accessed in a 
subroutine. 

According to the specification of the JVM jSj, 

— For each instruction and each jsr needed to reach that instruction, a bit 
vector is maintained of all local variables accessed or modified since the 
execution of the jsr instruction. 

— For any local variable for which the bit vector (constructed above) indi- 
cates that the subroutine has accessed or modified, use the type of the local 
variable at the time of the ret. 

— For other local variables, use the type of the local variable before the jsr 
instruction. 

The work of Stata and Abadi and that of Qian faithfully follow this specification. 
Local variables that are accessed or modified in each subroutine are recorded. 
Those variables that are not accessed or modified are simply ignored during the 
subsequent analysis of the subroutine. 

The method proposed in this paper assigns a type of the form last(l) to 
variable 1 in subroutine 7. This means that it includes a value passed from 
variable 1 of the caller. This type is propagated through instructions in the 
subroutine. In particular, by the instruction store(2), the type of variable 2 
becomes last(l). This information is then used when the control returns from 
the subroutine to the caller. In this way, the polymorphism of local variables in 
a subroutine is expressed by types of the form last(x). 

In our method, return addresses have types of the form return(n). A type 
of the form return(n) means to return to the n-th outer caller. For example, 
the address returning to the immediate caller has type the return) 1), while the 
address returning to the caller of the caller has type the return(2). 

In Stata and Abadi’s work (and similarly in Qian’s work), return addresses 
have types of the form (ret-from L), where L is the address of the subroutine, 
from which the callers of the subroutines are obtained. In our analysis, the 
callers are obtained from the set of histories assigned to each instruction (cf. 
Sect. 4.3). 

3 Virtual Machine 

In this section we formalize a subset of the JVM, which resembles that of Stata 
and Abadi HE!. Differences are mainly for the examples by which we want to 
show the power of our framework. 
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3.1 Values 

A value is a return address or an integer or an object pointer. We can easily 
add other kinds of values, such as that of floating point number. In the following 
formal treatment of the operational semantics of the virtual machine, a return 
address has the constructor retaddr, an integer the constructor intval, and an 
object pointer the constructor objval. They all take an integer as an argument. 



3.2 Instructions 

A bytecode program is a list of instructions. An instruction takes one of the 
following formats. 



jsr(P) 


(P: subroutine address) 


ret(x) 


(x: variable index) 


load(a;) 


(x: variable index) 


store(x) 


(x: variable index) 


constO 




constNULL 




inc(a;) 


(x: variable index) 






ifO(L) 


(L: branch address) 


ifNULL(P) 


(L: branch address) 


halt 









Each mnemonic is considered as a constructor of instructions. Some of the 
mnemonics takes a nonnegative integer cc or L as an operand. 



3.3 Operational Semantics 

The virtual machine consists of 

— the program, which is a list of instructions and denoted by P, 

— the program counter, which is an index to P, 

— the local variables, where the list of values of the local variables is denoted 
by /, and 

— the operand stack, denoted by s. 

Let us use the notation l[i] for extracting the t-th element of list I, where the first 
element of I has the index 0. The i-th instruction of the program P is denoted 
by P[i]- The value of the x-th local variable is denoted by f[x]. The p-th element 
of the operand stack s is denoted by s[p], where s[0] denotes the top element of 
s. 

As in the work by Stata and Abadi, the operational semantics of the virtual 
machine is defined as a transition relation between triples of the form (i, /, s), 
where i is the program counter, i.e., the index to the program P, / the value 
list of the local variables, and s the operand stack. While the length of s may 
change during execution of the virtual machine, the length of /, i.e., the number 
of local variables is unchanged. The program P, of course, never changes during 
execution. 

The transition relation is defined as follows. 
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— If P[i\ = jsr(L), then (i,f,s) — >■ (L, /, retaddr(z + l)::s). 

The return address retaddr(f + 1) is pushed onto the operand stack. 

The operator :: is the cons operator for lists. 

— If P\i] = ret(a;) and f\x] = retaddrlj + I), then (i, /, s) — >■ (?' + I, /, s). 

— If P[i] = load(x), then (f,/,s) ^ {i + 1, f, f[x]::s). 

— If P[t] = store(a:), then (i, /, w::s) — >■ {i + 1, f[x v], s) . 

The notation f[x>-^v] means a list whose element is the same as that of / 
except for the x-th element, which is set to v. 

— If P[i] = constO, then {i, /, s) — >■ {i + I, intval(0)::s). 

— If P[i] = constNULL, then {i,f,s) — >■ (i + I, objval(0)::s). 

— If P[i] = inc(a;) and f[x] = intval(fc), then 
(i,f,s) — >■ (i + I, /[x I— >■ intval(fc + I)], s). 

— If P[i] = ifO(L), then (z, /, intval(0)::s) — > {L,f,s). 

If P[i] = ifO(L) and k ^ 0, then (z, /, intval(fc)::s) —>■ (z + I, /, s). 

— If P[i] = ifNULL(L), then (z, /, obj val(0)::s) — >■ {L,f,s). 

If P[i\ = ifNULL(L) and fc yf 0, then (z, /, objval(/c)::s) — >■ (z + I, /, s). 

The transition relation — >■ is considered as the least relation satisfying the above 
conditions. 

The relation is defined so that when a type error occurs, no transition is 
defined. This means that to show the type safety of bytecode is to show that a 
transition sequence stops only at the halt instruction. 

For proving the correctness of our bytecode analysis, we also need another 
version of the operational semantics that maintains invocation histories of sub- 
routines. This semantics corresponds to the structured dynamic semantics of 
Stata and Abadi. The transition relation is now defined for quadruples of the 
form {i, f, s,h), where the last component h is an invocation history of subrou- 
tines. It is a list of addresses of callers of subroutines. This component is only 
changed by the jsr and ret instructions. 

— If P[z] = jsr(L), then (z, /, s, ft.) — >■ (L, /, retaddr(z -|- I)::s, z::ft). 

Note that the address z of the caller of the subroutine is pushed onto the 
invocation history. 

— If P[i] = ret(x), f[x] = retaddr(j -|- I) and ft = ft'@[j]@ft", where j does 
not appear in ft', then (z, /, s, ft) — >■ {j + I, /, s, ft"). 

The operator @ is the append operator for lists. 

For other instructions, the invocation histories before and after transition are 
the same. 

As for the two transition relations, we immediately have the following propo- 
sition. 

Proposition 1: If (z, /, s, ft) (z', /', s', ft'), then (z, /, s) (z', /', s'). 

4 Analysis 

4.1 Types 

Types in our analysis are among the following syntactic entities: 
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T, _L (top and bottom) INT, OBJ, • • • (basic types) 

return(n) (n: caller level) last(a;) (x: variable index) 

A type is T, J_, a basic type, a return type, or a last type. In this paper, 
we assume as basic types INT, the type of integers, and OBJ, the type of ob- 
ject pointers. It is easy to add other basic types, such as that of floating point 
numbers. 

return types and last types are only meaningful inside a subroutine. A 
return type is the type of a return address. For positive integer n, return(n) 
denotes the type of the address for returning to the n-th outer caller. For exam- 
ple, return) 1 ) denotes the type of the address for returning to the direct caller 
of the subroutine, and return( 2 ) the type of the address for returning to the 
caller of the caller. 

A last type means that a value is passed from the caller of the subroutine. 
For nonnegative integer x, last(x) denotes the type of a value that was stored 
in the x-th local variable of the caller. A value can have this type only when it 
is exactly the same as the value of the x-th local variable when the subroutine 
was called. 

4.2 Order among Types 

We define the order among types as follows. 

T > INT > _L T > OBJ > _L 

T > return(n) > J_ T > last(x) > J_ 

Since we do not distinguish object pointers by their classes in this paper, the 
order is flat, with T and J_ as the top and bottom elements. 

This order is extended to lists of types. For type lists ti and t2, ti > <2 holds 
if and only if and <2 are of the same length and tfli] > t2[i] holds for any i 
ranging over the indices for the lists. 

4.3 Target of Analysis 

The target of our bytecode analysis is to obtain the following pieces of informa- 
tion for the z-th instruction of the given program P. 

F^ S, H, 

Fi is a type list. Fflx] describes the type of /[x], i.e., the value of the x-th local 
variable of the virtual machine. Si is a also type list. Each element of Si de- 
scribes the type of the corresponding element of the operand stack of the virtual 
machine. Both F) and Si describe the types of the components of the virtual 
machine just before the i-th instruction is executed. Fli is a set of invocation 
histories for the z-th instruction. 

F, S and FI should follow a rule that is defined for each kind of P[i]- The 
rule says that certain conditions must be satisfied before and after the execution 
of P[i]. 
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Rule for jsr) If h G Hi and P[i] = jsr(A), then the following conditions must 
be satisfied. 

— 0 < L < |P|. 

— For each variable index y, either 

FlIu] > return(n + 1) 

(if Fi[y] = return(n)), and 

FlIv] > F,[y] 

(if Fi[y] is neither return nor last) 
or 

FlIv] > last(y) 

(even if Fi[y] is last). 

~ |5'L| = |S'i| + l, where |/| denotes the length of list 1. 

~ *S'l[ 0] > return(l). 

— For each index p, where 0 < p < 

5'i[p] is not last, 

Sl[p+ 1] > 

(if Si[p] is not return), and 

Sl [p + 1] > return(n + 1) 

(if Si[p] = return(n)). 

— i does not appear in h. (Recursion is not allowed.) 

— i::h G Hl- 

Note that when Fi[y] is not last, F^ly] cannot be determined uniquely. We must 
make a nondeterministic choice between return(n+ 1) and Fi[y], See Sect. 5 for 
more discussions on the implementation of the analysis. 

The following figures show the two possibilities for typing local variables 
inside a subroutine. In this example, it is assumed that there is only one caller (2) 
of subroutine 7. The column [Fi[0], Fi[l], Fi[2]] shows the types of local variables 
before each instruction is executed. At subroutine 7, it is set to [T,1NT,T] or 
[l(0), 1(1), 1(2)]. There are more possibilities. For example, one could also set it 
to [T, T, T], but this possibility is subsumed by the first. 



i instruction 


[Fi[ 0 ],Fi[l],F 42 ]] Si 


Hi 


[Fi[0],Fi[l],Fi[2]] Si 


Hi 


0 constO 


[T,T,T] 


D 


{[]} 


[T,T,T] 


[] 


{[]} 


1 store(l) 


[T,T,T] 


[INT] {[]} 


[T,T,T] 


[INT] {[]} 


2 jsr(7) 


[T, INT, T] 


D 


{[]} 


[T, INT, T] 


[] 


{[]} 


3 constNULL 


[T, INT, INT] 


D 


{[]} 


[T, INT, INT] 


[] 


{[]} 


7 store(O) 


[T, INT, T] 


(r(l)] {[2]} 


[1(0), 1(1), 1(2)] 


[r(l)] {[2]} 


8 load(l) 


[r(l),INT,T] 


[] 


{[2]} 


[r(l),l(l),l(2)] 


D 


{[2]} 


9 store(2) 


[r(l),INT,T] 


[INT] {[2]} 


[r(l),l(l),l(2)] 


[1(1)] {[2]} 


10 ret(O) 


]r(l), INT, INT] 


D 


{[2]} 


[r(l),l(l),l(l)] 


[] 


{[2]} 



Rule for ret) If h G Hi and P[i] = ret(a;), then the following conditions must 
be satisfied. 



Fi[x] = return(n). 
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- h = where \h'\ = n — 1. 

- 0 < j + 1< \P\. 

- For each variable index y, 

Fj+i[y] > follow Jast{n, h, Fi[y]). 

^ Sj+i > follow Jast{n, h, Si). 

- h" G 

follow-last is a function for extracting the type of a variable in a caller of a sub- 
routine according to an invocation history. For nonnegative integer n, invocation 
history h and type t, follow J,ast{n,h,t) is defined as follows. 

follow Jast{0, h,t) = t 

follow Jast{n + 1, return(m)) = 

if m > n + I then return(m — n — 1) else T 
follow Jastfn + 1, i::h, last(a;)) = followJast{n, h, Fi[x\) 
followJast{n + 1, i::h, f) = t (otherwise) 

follow Jast is extended to type lists, i.e., follow Jast{n,h,t) is also defined when 
t is a type list. 

Rule for load) If h G Hi and P[i] = load(x), then the following conditions 
must be satisfied. 

- 0 < i -|- 1 < |P|. Fi+i > Pi- Si+i > Fi[x]::Si. h G Ffi+i. 

Rule for store) li h G Hi and P[i] = store(a;), then the following conditions 
must be satisfied. 

0 ^ z -t“ 1 < l-^l- ~ t'.'.t. F)-|_i ^ F) [x I — y t] . 5'^-j-i P t. h G Hi.^i. 

Rule for constO) li h G Hi and P[i] = constO, then the following conditions 
must be satisfied. 

- 0 < z-h 1< |P|. F,+i > F,. S',+1 > INT-S',. h G H,+i. 

The rule for constNULL is similar. 

Rule for inc) If h G Hi and P[i] = inc(x), then the following conditions must 
be satisfied. 

- 0 < z -h 1< |P|. Fi[x] = INT. F,+i > F,. S,+i >S,.hG H,+i. 

Rule for ifO) li h G Hi and P[i] = if 0(L), then the following conditions must 
be satisfied. 

- 0 < L < |P|. 0 < z-h 1 < \P\. 

- S, = INT::f. Fl > F,. F,+i > F,. Sl > t. Si+i > t. 

~ h G Hi^. h G iFi+i. 

The rule for if NULL is similar. 

Rule for halt) There is no rule for halt. 
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4.4 Correctness of Analysis 

In order to state the correctness of our analysis, we first introduce the following 
relation. 

(u, h) : t 

V is a value and h is an invocation history, t is a type. By (u, h) : t, we mean that 
the value v belongs to the type t provided that v appears with the invocation 
history h. Following is the definition of this relation. 

— {v,h):T. 

— (intval(fc), h) : INT. 

— (objval(fc), h) : OBJ. 

— If h[n — I] = j, then (retaddr(j + I), /i) : return(n). 

— If (v,h) : Fi[x], then (v,i::h) : last(a;). 

This definition is also inductive, i.e., (v,h) : t holds if and only if it can be 
derived only by the above rules. 

We have two lemmas. 

Lemma 1: If (u, h) : t and t' > t, then {v, h) : t' . 

Lemma 2: Let h' be a prefix of h of length n and h” be its corresponding suffix, 
i.e., h = h'@h” and \h'\ = n. If {v, h) : t, then {v, h") : foUowJast{n, h, t). 

We say that the quadruple (i,f,s,h) is sound with respect to {F, S, H) and 
write {i, /, s, h) : {F, S, H), if the following conditions are satisfied. 

— 0 < t < |P|. 

— For each variable index y, {f[y\, h) : Fi[y], 

— For each index p for s, (s[p], h) : Si[p\. 

— h€ F[^. 

— h does not have duplication, i.e., no element of h occurs more than once in 
h. 

We have the following correctness theorem. It says that if F, S and F[ fol- 
low the rule for each instruction of P, then the soundness is preserved under 
the transition of quadruples. This means that if the initial quadruple is sound, 
then quadruples that appear during execution of the virtual machine are always 
sound. 

Theorem (correctness of analysis): Assume that F, S and H follow the rule 
for each instruction of P. If {i, /, s, h) : {F, S', FI) and {i, /, s, h) — >■ {i' , /', s', h'), 
then {i',f,s',h') : {F,S,H). 

The theorem is proved by the case analysis on the kind of P[i]- In this short 
paper, we only examine the case when P[i] = ret (a;). 

Assume that (i,f,s,h) : (F,S,H) and (i,f,s,h) — >■ {i' , f , s' , h'). Since F, S 
and Ft follow the rule for ret, the following facts hold. 

(i) Fi[x] = return(n). 
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(ii) h = hi@[j]@h 2 , where \hi \ = n — 1. 

(hi) 0<j + l< |P|. 

(iv) For each variable index y, 

Fj+i [y] > follow Jast{n, h,Fi[y\). 

(v) 5j+i > followJast{n, h, Si). 

(vi) /i 2 e Hj+i. 

By (i) and the soundness of (i,f,s,h), (f[x],h) : return(n). Therefore, by (ii), 
f[x] = retaddr(j+l) and i' = j+1. Moreover, since h does not have duplication, 
hi does not contain j. This implies that h' = h 2 - We also have that f = f and 
s' = s. 

Let us check the conditions for the soundness of (i', /', s', h') = (j+1, /, s, /i 2 ). 

— By (hi), 0 < i' < |P|. 

— By (iv), Fii[y] > followJast{n, h, Fi[y]). By the soundness of {i, f,s,h), (f[y],h) 
Fi[y], By Lemma 2, {f[y],h') : followJast(n,h, Fi[y]). Therefore, by Lemma 
1 . {f[y]^h') : Fi,[y]. 

— Similarly, by (v), we have that (s[p],h') : Sii[p\. 

— By (vi) and since h' = h 2 , h' G Hit. 

— Finally, since h does not have duplication, h' does not have duplication, 
either. 



Proposition 2: If (i,f,s,h) : (F,S,H) and (i,f,s) — >■ then there 

exists some h' such that (i,f,s,h) — >■ (*', /', s', h'). 

The only case that must be examined is that of ret. Note that h' is uniquely 
determined. 

The above proposition guarantees that if F, S and H follow the rule for each 
instruction and the initial quadruple {i, f, s,h) is sound, then the transition 
sequence starting from the triple (*, /, s) can always be lifted to a sequence 
starting from (i,/, s,/i). This means that the semantics for triples and that for 
quadruples coincide when F, S and FI follow the rule for each instruction. A 
similar lemma is stated in m, which establishes a correspondence between their 
stackless semantics and their structured semantics. 

Lemma 3: If {i,f,s,h) : {F,S,H), then {i,f,s,h) has the next state unless 
P[i] = halt. 

The following final theorem, derived from the above lemma and the previous 
theorem, guarantees the type safety of bytecode. This corresponds to Theorem 1 
(Soundness) in [TT^. 

Theorem (type safety): If F, S and H follow the rule for each instruction 
and the initial quadruple (i, /, s, h) is sound, then a transition sequence stops 
only at the halt instruction. 
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4.5 Example 

Let us abbreviate last(x) and return(n) by l(a;) and r(n), respectively. Fol- 
lowing is the result of analyzing the example in Sect. 2. 



i instruction 


[F 40 ],Fi[l],F 42 ]] Si 


Hi 


0 constO 


[T,T,T] 


D 


{[]} 


1 store(l) 


[T,T,T] 


]INT] 


{[]} 


2 jsr(7) 


[T, INT, T] 


[] 


{[]} 


3 constNULL 


[T, INT, INT] 


D 


{[]} 


4 store(l) 


[T, INT, INT] 


]0BJ] 


{[]} 


5 jsr(7) 


]T, OBJ, INT] 


D 


{[]} 


6 halt 


]T, OBJ, OBJ] 


D 


{[]} 


7 store(O) 


(1(0), 1(1), 1(2)] 


[r(l)( {[2],(5]} 


8 load(l) 


[r(l),l(l),l(2)] 


D 


{[2], [5]} 


9 store(2) 


[r(l),l(l),l(2)] 


[1(1)1 {[2], [5]} 


10 ret(O) 


[r(l),l(l),l(l)] 


D 


{[2], [5]} 



The rule for ret(O) at 10 is satisfied because, for [2] G iLio, 

- Fio[0] = return(l). [2] = []@[2]@0- 0 < 2-hl = 3 < 11. 

- Fap] = T > followJast{l, [2],Fio[0]) = 
follow Jast{l, [2], return(l)) = T. 

- F3[1] = INT > followJast{l, [2],Fio[l]) = 
follow [2],last(l)) = F 2 [l] = INT. 

- Fa[2] = INT > follow Jastil, [2],Fio[2]) = 
follow Jast{l, [2],Iast(l)) = F 2 [l] = INT. 

- ^3 = []. []&{[]} = H„ 

and similarly for [5] G Hiq. 



4.6 Returning to an Outer Caller 

When P[i] — ret(x) returns to the caller of the caller, for example, Fi[x] must 
be equal to the type return(2). In this case. Hi should consist of histories of 
length at least 2. 

If [ji, j 2 , J 3 , • • •] is in Hi, P[i] returns to i' = J 2 -TI and Fi> should satisfy 



Ft'[y] > followJast{2, [ji, j 2 , J3, • • •], -FiM)- 



If Fi[y] = last(y) and Fj-^ = last(y), for example, then 



follow Jast{2, [ji, J 2 , J 3 , ■■■],F^[y\) = Fj^ [y\. 



This is how information at j ’2 (which is a jsr) is propagated to i' = j 2 +l- If 
Fi[y] is not last, information at i is propagated. 
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5 Implementation 

A dataflow analysis is usually implemented by an iterative algorithm. For each 
instruction, we check if the rule for the instruction is satisfied by F, S and FI . 
If not, we update F, S and FI accordingly, and check the next instruction that 
is affected by the update. 

There are two problems for implementing our analysis by such an iterative 
algorithm. Firstly, the rule for jsr does not uniquely determine Fl[ 2 /] when Fi[y\ 
is not last. We have two choices: one is to set Fl[j/] = last(y), and the other 
is to set Fl[j/] = Fi[y\ (or = return(n + 1) if Fi[y\ = return(n)). In our 

current implementation, we first set Fi^[y\ = lastly) and proceed the analysis. If 
the analysis fails at some point because F^ly] = last(y), we take the alternative 
and redo the analysis from L. (We need not completely abandon the work after 
we set FlIv] = last(j/).) 

The second problem is that by a naive iterative algorithm, a subroutine is 
analyzed each time it is called. In the worst case this may require an exponential 
number of steps with respect to the length of the program. This problem can be 
avoided by representing invocation histories by a node in the call graph of the 
program, which is a graph whose nodes are addresses of subroutines and whose 
(directed) edges are labeled with addresses of jsr instructions. Since the JVM 
does not allow recursion, it is a connected acyclic graph with a unique root node 
representing the initial address. 

A path from the root to a node in the graph corresponds to an invocation 
history by concatenating the labels of edges in the path. Each node in the graph 
then represents the set of all the invocation histories from the root to the node. 
Now, instead of keeping a set of invocation histories (i.e., Hi), we can keep a set 
of nodes in the graph. 

From a program in the following (left), the call graph in the right is con- 
structed. The node L 3 represents the set {[c, b, a], [c, b' , a'], [d , b, a], [o' , b' , a']} of 
invocation histories. 



a : jsr(Li) 
a' : jsr(L'i) 

Li : ... 

b : jsr(i 2 ) 

l; : . . . 

b' : jsr(i 2 ) 



L2 

c 

/ 

c 



jsr(i3) 

jsr(i3) 



L3 : . . . 





On a New Method for Dataflow Analysis 



31 



If histories are represented by nodes in the call graph, then the values that 
Hi can take are bounded by the set of all the nodes in the call graph. This means 
that Hi can only be updated for the number of times equal to the number of 
nodes. The number of overall updates is, therefore, limited by n?, where n is 
the number of instructions in the program. In order to achieve a polinomial 
complexity of the entire analysis, however, the nondeterministic choice in the 
handling of j sr instructions must be restricted. 

6 Concluding Remark 

Since we introduced types of the form last (a:), it has become possible to assign 
types to polymorphic subroutines that move a value from a variable to another. 
Our analysis is towards the real polymorphism of subroutines in binary code, 
because we do not simply ignore unaccessed variables. For some programs, our 
method is more powerful than existing ones. In particular, we impose no restric- 
tions on the number of entries and exits of a subroutine. It is also important that 
the proof of the correctness of bytecode verification becomes extremely simple. 

We only formalized a very small subset of the JVM. We believe that the 
framework of the paper can be extended to the full language. The extension is 
almost straight forward. In particular, adding new kinds of types seems to cause 
no difficulty. The correct handling of exceptions and that of object initialization 
are problematic but are not impossible jl3lt)j . The resulting bytecode verifier is 
expected to be more powerful than the existing one, so the Java compiler will 
gain more freedom in bytecode generation. 

It is also interesting whether the framework can be applied to the analysis 
of other kinds of bytecode or native code. Handling recursive calls is the key 
to such applications. In order to allow recursion, we must be able to represent 
histories of an indefinite length by a kind of regular expression. Stacks generated 
by recursive calls should also be represented by regular expressions. All this is 
left as future work. 

A dataflow analysis, in general, assigns an abstract value Xi to the i-th in- 
struction so that a certain predicate P{xi,cr) always holds for any state cr that 
reaches the i-th instruction. To this end, for any transition a — ^ cr', where a' is 
at i', one must show that P{xi,a) implies P{xi>,a'). 

Since a corresponds to {i, f, s, h) in our analysis, xi seems to correspond to 
{Fi,Si,Hi). However, the predicate (i,f,s,h) : (F,S,H), which should corre- 
spond to P{xi,a), does not only refer to {Fi,Si,Hi). When Fi[y] is last, it 
also refers to F)i[o]- This means that in terms of last types, our analysis relates 
values assigned to different instructions. This makes the analysis powerful while 
keeping the overall data structure for the analysis compact. 

The representation of invocation histories by a node in the call graph is also 
for making the data structure small and efficient. By this representation, the 
number of updates of Hi is limited by the size of the call graph, and an iterative 
algorithm is expected to stop in polynomial time with respect to the program 
size. This kind of complexity analysis should be made rigorous in the future. 
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Abstract. We present a new static analysis technique based on Array 
SSA form [HI. Compared to traditional SSA form, the key enhancement in 
Array SSA form is that it deals with arrays at the element level instead 
of as monolithic objects. In addition, Array SSA form improves the 4> 
function used for merging scalar or array variables in traditional SSA 
form. The computation of a 0 function in traditional SSA form depends 
on the program’s control flow in addition to the arguments of the 4> 
function. Our improved (j> function (referred to as a ^ function) includes 
the relevant control flow information explicitly as arguments through 
auxiliary variables that are called @ variables. 

The @ variables and ^ functions were originally introduced as run-time 
computations in Array SSA form. In this paper, we use the element- 
level ^ functions in Array SSA form for enhanced static analysis. We 
use Array SSA form to extend past algorithms for Sparse Constant 
propagation (SC) and Sparse Conditional Constant propagation (SCC) 
by enabling constant propagation through array elements. In addition, 
our formulation of array constant propagation as a set of data flow 
equations enables integration with other analysis algorithms that are 
based on data flow equations. 

Keywords: static single assignment (SSA) form, constant propagation, 
conditional constant propagation. Array SSA form, unreachable code 
elimination. 
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1 Introduction 

The problems of constant propagation and conditional constant propagation (a 
combination of constant propagation and unreachable code elimination) have 
been studied for several years. However, past algorithms limited their attention to 
constant propagation of scalar variables only. In this paper, we introduce efficient 
new algorithms that perform constant propagation and conditional constant 
propagation through both scalar and array references. 

One motivation for constant propagation of array variables is in optimization 
of scientific programs in which certain array elements can be identified as con- 
stant. For example, the SPEC95fp P| benchmark lOT.mgrid contains an array 
variable A that is initialized to four constant- valued elements as shown in Fig.Q 
A significant amount of the time in this application is spent in the triply nested 
loop shown at the bottom of Fig. E Since constant propagation can determine 
that A (2) equals zero in the loop, an effective optimization is to eliminate the 
entire multiplicand of A (2) in the loop nest. Doing so eliminates 11 of the 23 
floating-point additions in the loop nest thus leading to a significant speedup. 

Another motivation is in analysis and optimization of field accesses of struc- 
ture variables or objects in object-oriented languages such as Java and C-|— b. 
A structure can be viewed as a fixed-size array, and a read/ write operation of 
a structure field can be viewed as a read/write operation of an array element 
through a subscript that is a compile-time constant. This approach is more 
compact than an approach in which each field of a structure is modeled as a 
separate scalar variable. This technique for modeling structures as arrays directly 
extends to nested arrays and structures. For example, an array of rank n of 
some structure type can be modeled as an array of rank n + 1. Therefore, the 
constant propagation algorithms presented in this paper can be efficiently applied 
to structure variables and to arrays of structures. Extending these algorithms 
to analyze programs containing pointer aliasing is a subject for future research, 
however. 

The best known algorithms for sparse constant propagation of scalar vari- 
ables I8I2I are based on static single assignment (SSA) form However, tra- 
ditional SSA form views arrays as monolithic objects, which is an inadequate 
view for analyzing and optimizing programs that contain reads and writes of 
individual array elements. In past work, we introduced Array SSA form jOj to 
address this deficiency. The primary application of Array SSA form in |S| was 
to enable parallelization of loops not previously parallelizable by making Array 
SSA form manifest at run-time. In this paper, we use Array SSA form as a 
basis for static analysis, which means that the Array SSA form structures can 
be removed after the program properties of interest have been discovered. 

Array SSA form has two distinct advantages over traditional SSA form. First, 
the (j) operator in traditional SSA form is not a pure function and returns different 
values for the same arguments depending on the control flow path that was 
taken. In contrast, the corresponding (p operator in Array SSA form includes 
@ variables as extra arguments to capture the control information required i.e., 
X 3 := (p{x 2 ,Xi) in traditional SSA form becomes X 3 := <?(x 2 , @X 2 , xi, @Xi) in 
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! Initialization of array A 
REALMS A(0:3) 

A(0) = -8.0D0/3.0D0 
A(l) = O.ODO 
A (2) = 1.0D0/6.0D0 

A (3) = 1.0D0/12.0D0 



! Computation loop in subroutine RESIDO 

do i3 = 2, n-1 
do i2 = 2, n-1 
do il = 2, n-1 

R(il,i2,i3)=V(il,i2,i3) 

-A(0)*( U(il, i2, i3 ) ) 

-A(l)*( U(il-l,i2, i3 ) + U(il+l,i2, i3 ) 

+ U(il, i2-l,i3 ) + U(il, i2+l,i3 ) 

+ U(il, i2, i3-l) + U(il, i2, i3+l) ) 

-A(2)*( U(il-l,i2-l,i3 ) + U(il+1 , i2-l , i3 ) 

+ U(il-l,i2+l,i3 ) + U(il+l,i2+l,i3 ) 

+ U(il, i2-l,i3-l) + U(il, i2+l,i3-l) 

+ U(il, i2-l,i3+l) + U(il, i2+l,i3+l) 

+ U(il-l,i2, i3-l) + U(il-l,i2, i3+l) 

+ U(il+l,i2, i3-l) + U(il+l,i2, i3+l) ) 

-A(3)*( U(il-l,i2-l,i3-l) + U(il+1 , i2-l , i3-l) 

+ U(il-l,i2+l,i3-l) + U(il+l,i2+l,i3-l) 

+ U(il-l,i2-l,i3+l) + U(il+l,i2-l,i3+l) 

+ U(il-l,i2+l,i3+l) + U(il+l,i2+l,i3+l) ) 

end do 
end do 
end do 



Fig. 1. Code fragments from the SPEC95fp 107.mgrid benchmark 

Array SSA form. Second, Array SSA form operates on arrays at the element 
level rather than as monolithic objects. In particular, a operator in Array SSA 
form, A_3 := <P{A2,@A2, Ai,@Ai), represents an element-level merge of A2 and 
Ai. 

Both advantages of Array SSA form are significant for static analysis. The 
fact that Array SSA form operates at the element-level facilitates transfer of 
statically derived information across references to array elements. The fact that 
the ^ is a known pure function facilitates optimization and simplification of the 
<P operations. 

For convenience, we assume that all array operations in the input program 
are expressed as reads and writes of individual array elements. The extension 
to more complex array operations {e.g., as in Fortran 90 array language) is 
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straightforward, and omitted for the sake of brevity. Also, for simplicity, we 
will restrict constant propagation of array variables to cases in which both the 
subscript and the value of an array definition are constant e.g., our algorithm 
mights recognize that a definition A\k] := i is really A[2] := 99 and propagate 
this constant into a use of A[2], The algorithms presented in this paper will not 
consider a definition to be constant if its subscript has a non-constant value e.g., 
A[m] := 99 where m is not a constant. Performing constant propagation for 
such references {e.g., propagating 99 into a use of A[m] when legal to do so) is 
a subject of future work. 

The rest of the paper is organized as follows. Section El reviews the Array 
SSA form introduced in |E| . Section 01 presents our extension to the Sparse 
Constant propagation (SC) algorithm from |H| that enables constant propaga- 
tion through array elements. It describes how lattice values can be computed 
for array variables and ^ functions. Section 0 presents our extension to the 
Sparse Conditional Constant propagation (SCC) algorithm from 0 that enables 
constant propagation through array elements in conjunction with unreachable 
code elimination. Section 0 discusses related work, and Sect. 0 contains our 
conclusions. 

2 Array SSA Form 

In this section, we describe the Array SSA form introduced in |S|. The goal of 
Array SSA form is to provide the same benefits for arrays that traditional SSA 
provides for scalars but, as we will see, it has advantages over traditional SSA 
form for scalars as well. We first describe its use for scalar variables and then its 
use for array variables. 

The salient properties of traditional SSA form are as follows: 

1. Each definition is assigned a unique name. 

2. At certain points in the program, new names are generated which combine 
the results from several definitions. This combining is performed hy & 4> 
function which determines which of several values to use, based on the flow 
path traversed. 

3. Each use refers to exactly one name generated from either of the two rules 
above. 

For example, traditional SSA form converts the code in Fig. El to that in 
Fig.0 The 83 := (j){Si, 82) statement defines S '3 as a new name that represents 
the merge of definitions 8\ and 82. It is important to note that the (j) function 
in traditional SSA form is not a pure function of 81 and 82 because its value 
depends on the path taken through the if statement. Notice that this path is 
unknown until runtime and may vary with each dynamic execution of this code. 

In contrast to traditional SSA form, the semantics of a ^ function is de- 
fined to be a pure function in our Array SSA form. This is accomplished by 
introducing @ variables (pronounced “at variables”), and by rewriting a (j) func- 
tion in traditional SSA form such as (f>{8i,82) as a new kind of ^ function. 



Enabling Sparse Constant Propagation of Array Elements 



37 



<P{Si,@Si, S 2 ,@S 2 )- For each static definition Sk, its @ variable @Sk identifies 
the most recent “time” at which Sk was modified by this definition. 

For an acyclic control flow graph, a static definition Sk may execute either 
zero times or one time. These two cases can be simply encoded as @Sk = false 
and @Sk = TRUE to indicate whether or not definition Sk was executed. For 
a control flow graph with cycles (loops), a static definition Sk may execute an 
arbitrary number of times. In general, we need more detailed information for 
the @Sk = TRUE case so as to distinguish among different dynamic execution 
instances of static definition Sk- Therefore, @Sk is set to contain the dynamic 
iteration vector at which the static definition Sk was last executed. 

The iteration vector of a static definition Sk identifies a single iteration in 
the iteration space of the set of loops that enclose the definition. Let n be the 
number of loops that enclose a given definition. For convenience, we treat the 
outermost region of acyclic control flow in a procedure as a dummy outermost 
loop with a single iteration. Therefore n > 1 for each definition. A single point in 
the iteration space is specified by the iteration vector i = (ii, . . . , in), which is an 
n-tuple of iteration numbers one for each enclosing loop. We do not require that 
the surrounding loops be structured counted loops {i.e., like Fortran DO loops) 
or that the surrounding loops be tightly nested. Our only assumption is that all 
loops are single-entry, or equivalently, that the control flow graph is reducible 
[Q. For single-entry loops, we know that each def executes at most once in a 
given iteration of its surrounding loops. All structured loops (e.g., do, while, 
repeat-until) are single-entry even when they contain multiple exits; also, most 
unstructured loops (built out of goto statements) found in real programs are 
single-entry as well. A multiple-entry loop can be transformed into multiple 
single-entry loops by node splitting m 

Array SSA form can be used either at run-time as discussed in or for static 
analysis, as in the constant propagation algorithms presented in this paper. In 
this section, we explain the meaning of @ variables as if they are computed 
at run-time. We assume that all @ variables, @Sk, are initialized to the empty 
vector, @Sk ■= ( ), at the start of program execution. For each real (non-<?) 
definition, Sk, we assume that a statement of the form @Sk ■= i is inserted 
immediately after definition where i is the current iteration vector for all 
loops that surround Sk- Each definition also has an associated @ variable. Its 
semantics will be defined shortly. All @ variables are initialized to the empty 
vector because the empty vector is the identity element for a lexicographic max 
operation i.e., max(( ),i) = i, for any @ variable value i- 

As a simple example. Fig. 0 shows the Array SSA form for the program 
in Fig. El Note that @ variables ©S'! and @S 2 are explicit arguments of the 
function. In this example of acyclic code, there are only two possible values 

^ It may appear that the @ variables do not satisfy the static single assignment 
property because each @Sk variable has two static definitions, one in the initialization 
and one at the real definition of Sk- However, the initialization def is executed only 
once at the start of program execution and can be treated as a special-case initial 
value rather than as a separate definition. 
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for each @ variable — the empty vector ( ) and the unit vector (1) — which 
correspond to false and true respectively. 

Figure 0 shows an example for- loop and its conversion to Array SSA form. 
Because of the presence of a loop, the set of possible values for an @ variable 
becomes unbounded e.g., we may have @5'i = (100) on exit from the loop. 
However, ©S'! and @52 are still explicit arguments of the ^ function, and their 
iteration vector values are necessary for evaluating the ^ function at run-time. 

The semantics of a function can now be specified by a conditional expres- 
sion that is a pure function of the arguments of the <P. For example, the semantics 
of the function, 5a := <^(52, @^ 2 , 5i, @5i) in Fig. |3 can be expressed as a 
conditional expression as follows (where ^ denotes a lexicographic greater-than- 
or-equal comparison of iteration vectors): 

if @^2 t @ 5 i then S 2 
S3 = else Si 
end if 

Following each ^ def, S 3 := ^(^ 2 , @^ 2 , 5i, @5i), there is the definition of 
the associated @ variable, @5a = max(@52, @5i), where max represents a 
lexicographic maximum operation of iteration vector values @52,@5i. 

Consider, for example, a condition C in Fig. Elthat checks if the value of i is 
even. In this case, definition S\ is executed in every iteration and definition S 2 
is executed only in iterations 2,4, 6, ... . For this “even-value” branch condition, 
the final values of @5i and @^2 are both equal to (100) if m = 100. Since these 
values satisfy the condition @^2 ^ @5i, the conditional expression will yield 
53 = 52. 

Consider another execution of the for-loop in Fig. El in which condition C 
evaluates to false in each iteration of the for loop. For this execution, the final 
values of @52 and @5i will be the empty vector ( ) and (100) respectively. 
Therefore, S 2 ^ Si, and the conditional expression for the function will yield 
S 3 = Si for this execution. 

The above description outlines how @ variables and functions can be 
computed at run-time. However, if Array SSA form is used for static analysis, 
then no run-time overhead is incurred due to the @ variables and functions. 
Instead, the @ variables and ^ functions are inserted in the compiler intermediate 
representation prior to analysis, and then removed after the program properties 
of interest have been discovered by static analysis. 

We now describe Array SSA form for array variables. Figure O shows an 
example program with an array variable, and the conversion of the program to 
Array SSA form as defined in |0j. The key differences between Array SSA form 
for array variables and Array SSA form for scalar variables are as follows: 

1. Array- valued @ variables: 

The @ variable is an array of the same shape as the array variable with 

which it is associated, and each element of an @ array is initialized to the 

empty vector. For example, the statement @Ai[/ci] := (1) is inserted after 
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if (C) then 
S~... 



else 

S~ ... 



end if 



Fig. 2. Control Flow with Scalar Definitions 



if (C) then 
Si~... 
else 



S2~... 

end if 

S 3 ■.= <l>{Sl,S 2 ) 



Fig. 3. Traditional SSA form 



@Si ■.= ( ) 

@S2 ~ ( ) 

if (C) then 
S'! := . . . 

©S'! := (1) 
else 

Sa := . . . 

@S'2 — (1) 

end if 

S3=<l>{Sl,@Sl,S2,@S2) 



Fig. 4. After conversion of Fig. |2lto Array SSA form 



40 



V. Sarkar and K. Knobe 



Example for-loop: 

S :=... 

for i := 1 to m do 

s 

if (C) then 
S := ... 
end if 
end for 



After conversion to Array SSA form: 

@Si := 0 ; @5'2 := ( ) 

S :=... 

@S := (1) 

for i := 1 to m do 

50 ■.= <P{S3,@S3,S,@S) 

@So := max(@S 3 , @S) 

51 := . . . 

@Si — (i) 
if (C) then 

S2 := . . . 

@S2 := (i) 

end if 

S3 — <?(S2,@S2,Sl,@Sl) 

@S 3 := max(@S 2 , @S 2 ) 

end for 



Fig. 5. A for-loop and its conversion to Array SSA form 
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Example program with array variables: 

nl : A[*] := initial value of A 

i := 1 

C := i < n 
if C then 
n2 : k ~ 2 * i 

A\k] i 
print A[k] 
endif 

n3 : print A[2] 



After conversion to Array SSA form: 

nl: := ( ) ; @C := ( ) ; @fc := ( ) ; @Ao[*] := ( ) ; @Ai[*]:=() 

Aq[*] := initial value of A 
@Ao[*] := (1) 
i := 1 
@i := (1) 

C := i < n 
@C (1) 
if C then 
n2: k 2 * i 

@k := (1) 

Ax[k] := i 
@Ai[k] ~ (1) 

A 2 ■- d<^(Ai,@Ai, Ao,@Ao) 

@^2 max(@Ai,@Ao) 
print A 2 [k] 
endif 

n4: A 3 := <!>{A 2 ,@A 2 ,Ao,@Ao) 

©As max(@A2, @Ao) 
print As [2] 



Fig. 6. Example program with an array variable, and its conversion to Array SSA form 
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statement 2li[fci] := i in Fig. 0 In general, @Ai can record a separate 
iteration vector for each element that is assigned by definition Ai. This 
initialization is only required for @ arrays corresponding to real (non-<?) 
definitions. No initialization is required for an @ array for a ^ definition 
(such as @^2 and @^3 in Fig.0 because its value is completely determined 
by other @ arrays. 

2. Array-valued functions: 

A ^ function for array variables returns an array value. For example, consider 
the <l> definition A3 := ^(A2, @A2, Aq, @Aq) in Fig. which represents a 
merge of arrays A2 and Aq. The semantics of the <P function is specified by 
the following conditional expression for each element, A3[j]: 

if @A 2 [j] ^ @Ao[j] then A 2 [j] 

Asij] = else Ao[j] 
end if 

Note that this conditional expression uses a lexicographic comparison (^) of 
@ values just as in the scalar case. 

3. Definition ^’s: 

The traditional placement of the ^ is at control merge points. We refer to 
this as a control A special new kind of ^ function is inserted immediately 
after each original program definition of an array variable that does not 
completely kill the array value. This definition merges the value of the 
element modified in the definition with the values available immediately prior 
to the definition. Definition <P’s did not need to be inserted for definitions 
of scalar variables because a scalar definition completely kills the old value 
of the variable. We will use the notation d<l> when we want to distinguish a 
definition (p function from a control (P function. 

For example, consider definition Ai in Fig. El The d<P function, A2 := 
d<?(Ai, @Ai, Aq, @Aq) is inserted immediately after the def of Ai to represent 
an element-by-element merge of Ai and Aq. Any subsequent use of the 
original program variable A (before an intervening def) will now refer to A2 
instead of Ai. The semantics of the d<P function is specified by the following 
conditional expression for each element, A2[j]: 

if @Ai[j] ^ @Ao[j] then Ai[j] 

Aab'] = else Ao[j] 
end if 

Note that this conditional expression is identical in structure to the condi- 
tional expression for the control <P function in item 2 above. 



3 Sparse Constant Propagation for Scalars and Array 
Elements 

We now present our extension to the Sparse Constant propagation (SC) al- 
gorithm from jSj that enables constant propagation through array elements. 
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Section EH contains our definitions of lattice elements for scalar and array 
variables; the main extension in this section is our modeling of lattice elements for 
array variables. Section E3 outlines our sparse constant propagation algorithm; 
the main extension in this section is the use of the definition <P operator in Array 
SSA form to perform constant propagation through array elements. 



3.1 Lattice Values for Scalar and Array Variables 

Recall that a lattice consists of: 

— a set of lattice elements. A lattice element for a program variable v is 
written as £{v), and denotes SEt(£(u)) = a set of possible values for variable 

V. 

— T (“top”) and _L (“bottom”), two distinguished elements of C. 

— A meet (or join) operator, □, such that for any lattice element e, e n T = e 
and e n _L = _L . 

“ A □ operator such that e □ / if and only if e □ / = /, and a Zl operator 
such that e □ / if and only if e □ / and e /. 

The height H of lattice C is the length of the largest sequence of lattice elements 
ei, 62, . . . , 6 h such that ct □ e^+i for all 1 < i < H. 

We use the same approach as in | 5 | for modeling lattice elements for scalar 
variables. Given a scalar variable S, the value of C{S) in our framework can be 
T, Constant or _L . When the value is Constant we also maintain the value 
of the constant. The sets denoted by these lattice elements are SEt(T) = { }, 
SET{Constant) = {Constant}, and SEt(T) = , where is the universal set 

of values for variable S. 

We now describe how lattice elements for array variables are represented in 
our framework. Let and be the universal set of index values and 

the universal set of array element values respectively for an array variable A in 
Array SSA form. For an array variable, the set denoted by lattice element C{A) 
is a subset of x i.e., a set of index-element pairs. Since we restrict 

constant propagation of array variables to cases in which both the subscript and 
the value of an array definition are constant, there are only three kinds of lattice 
elements of interest: 

1 . C{A) = T ^ set(£(A)) = { } 

This “top” case means that the possible values of A have yet to be determined 
i.e., the set of possible index-element pairs that have been identified thus far 
for A is the empty set, { }. 

2 . C{Al) = ((ii, 6i), (z2, 62), . . . ) 

^ set(£(A)) = {(ii, ei), (^2, 62), ... } U {Utd - {*iG2, • ■ ■ }) x 

In general, the lattice value for this “constant” case is represented by a finite 
ordered list of index-element pairs, ((ii, ei), (12, 62), . . . ) where i\, ei, 12, 62, . . . 
are all constant. The list is sorted in ascending order of the index values, 
i\,ii, . . . , and all the index values assumed to be distinct. 
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T 



<( 1 , 100 ), ( 2 , 101 ) > ... <( 2 , 101 ), ( 3 , 102 ) > 




< ( 1 , 100 ) > . . . < ( 2 , 101 ) > . . . < ( 3 , 102 ) > 



1 

Fig. 7. Lattice elements of array values with maximum list size Z = 2 

The meaning of this “constant” lattice value is that the current stage of 
analysis has determined some finite number of constant index-element pairs 
for array variable A, such that A[ii] = ei, A[i 2 ] = 62 , .... All other elements 
of A are assumed to be non-constant. These properties are captured by 
set(£(A)) defined above as the set denoted by lattice value C{A). 

For the sake of efficiency, we will restrict these constant lattice values to 
ordered lists that are bounded in size by a small constant, Z > 1 e.g., if 
Z = 5 then all constant lattice values will have < 5 index-element pairs. 
Doing so ensures that the height of the lattice for array values is at most 
{Z + 2). If any data flow equation yields a lattice value with P > Z pairs, 
then this size constraint is obeyed by conservatively dropping any (P — Z) 
index-element pairs from the ordered list. 

Note that the lattice value for a real (non-<?) definition, will contain at most 
one index-element pair, since we assumed that an array assignment only 
modifies a single element. Ordered lists with size > 1 can only appear as the 
output of (p functions. 

3. £(A) = T ^ SET(£(A))=wf„,xWi„, 

This “bottom” case means that, according to the approximation in the 
current stage of analysis, array A may take on any value from the universal 
set of index-element pairs. Note that C{A) = T is equivalent to an empty 
ordered list, C{A) = ( ). 







The lattice ordering (□) for these elements is determined by the subset 
relationship among the sets that they denote. The lattice structure for the 
Z = 2 case is shown in Fig.0 This lattice has four levels. The second level (just 
below T) contains all possible ordered lists that contain exactly two constant 
index-element pairs. The third level (just above T) contains all possible ordered 
lists that contain a single constant index-element pair. The lattice ordering 
is determined by the subset relationship among the sets denoted by lattice 
elements. For example, consider two lattice elements £1 = ((1, 100), (2, 101)) 
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and £2 = ((2, 101)). The sets denoted by these lattice elements are: 

set(£i) = {(1, 100), (2, 101)} U - {1, 2}) X 

set(£ 2 ) = {(2, 101)1 U - {2}) X 

Therefore, SEt(£i) is a proper subset of SEt(£2) and we have £1 Zl £2 he., £1 
is above £2 in the lattice in Fig. | 7 | 

Finally, the meet operator (□) for two lattice elements, £1 and £ 2 , for array 
variables is defined in Fig. 0 where £1 fl £2 denotes an intersection of ordered 
lists £1 and £ 2 . 



£3 = £1 n £2 


£2 = T 


£2 = ((ii,ei),...) 


£2 = T 


£1 =T 


T 


£2 


T 


£1 = ((*'i,ej),...) 


£1 


Hi n H2 


T 


£1 =T 


T 


T 


T 



Fig. 8 . Lattice computation for the meet operator, £3 = £1 fl £2 



3.2 The Algorithm 

Recall that the @ variables defined in Sect. El were necessary for defining the 
full execution semantics of Array SSA form. For example, the semantics of a 
operator, Ai := ^(Ai, @Ai, Aq, @Aq), is defined by the following conditional 
expression: 



if @Ai [j] h @Aq [j] then Ai [j] 

A2U] = else Ao[j] 
end if 

The sparse constant propagation algorithm presented in this section is a 
static analysis that is based on conservative assumptions about runtime be- 
havior. Let us first consider the case when the above ^ operator is a con- 
trol <P. Since algorithm in this section does not perform conditional constant 
propagation, the lattice computation of a control can be simply defined as 
£(A 2 ) = £(^(Ai, @Ai, Aq, @Aq) = £(Ai) n £(Aq) he., as a join of the lattice 
values £(Ai) and £(Aq). Therefore, the lattice computation for A 2 does not 
depend on @ variables @Ai and @Aq for a control 'P operator. 

Now, consider the case when the above <P operator is a definition P. The 
lattice computation for a definition <P is shown in Fig. 0 Since Ai corresponds 
to a definition of a single array element, the ordered list for £(Ai) can contain 
at most one pair. The insert operation in Fig. 0is assumed to return a new 
ordered list obtained by inserting into ((ii, ei), . . . ) with the following 

adjustments if needed: 
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— If there exists an index-element pair e^) in ((*i, ei), . . . ) such that i! = 

then the insert operation just replaces by 

— If the INSERT operation causes the size of the list to exceed the threshold 
size Z, then one of the pairs is dropped from the output list so as to satisfy 
the size constraint. 

Interestingly, we again do not need @ variables for the lattice computation in 
Fig. El This is because the ordered list representation for array lattice values 
already contains all the subscript information of interest, and overlaps with the 
information that would have been provided by @ variables. 



£(A2) 


£(Ao) = T 


£(Ao) = ((ii, ei), . . . ) 


£(Ao) = T 


£(Ai) = T 


T 


T 


T 


£(Ai) = ((t',e')> 


T 


iNSERT((i',e'), {(ii,ei), . . .)) 


((f',e')> 


£(Ai) = T 


T 


T 


T 



Fig. 9. Lattice computation for C{A2) = Cd4‘{C{Ai),C{Ao)) 



Therefore, we do not need to analyze @ variables for the sparse constant 
propagation algorithm described in this section, because of our ordered list 
representation of lattice values for array variables. Instead, we can use a partial 
Array SSA form which is simply Array SSA form with all definitions and uses 
of @ variables removed, and with (p operators instead of operators. If only 
constant propagation is being performed, then it would be more efficient to only 
build the partial Array SSA form. However, if other optimizations are being 
performed that use Array SSA form, then we can build full Array SSA form and 
simply ignore the @ variables for this particular analysis. 

Our running example is shown in Fig. uni The partial Array SSA form for this 
example is shown in Fig. mi The partial Array SSA form does not contain any @ 
variables since @ variables are not necessary for the level of analysis performed 
by the constant propagation algorithms in this paper. The data flow equations 
for this example are shown in Fig. El Each assignment in the Array SSA form 
results in one data flow equation. The numbering SI through S 8 indicates the 
correspondence . 

The argument to these equations are simply the current lattice values of 
the variables. The lattice operations are specific to the operations within the 
statement. Figures El El andElshow the lattice computations for an assignment 
to an array element (as in S3 and S5), definition (p (as in S4 and S 6 ), a reference 
to an array element (as in the RHS of S3 and S5). The lattice computation 
for a p assignment (as in S7) A 3 = (/>(A 2 ,Ai) is £(^ 3 ) = £^(£(^ 2 ), £(Ai)) = 
£(Ai) n £(^ 2 ) where □ is shown in Fig. 0 Notice that we also include lattice 
computation for specific arithmetic computations such as the multiply in S3 and 
S5. This allows for constant computation as well as constant propagation. Tables 
for these arithmetic computations are straightforward and are not shown. 
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The simple data flow algorithm is shown in Fig. uni Our example includes 
the propagation of values from a node to two successors (Y) and to a node 
from two predecessors (D). Equations SI and S2 are evaluated because they are 
associated with the entry block. Element 3 of Y 2 is known to have the value 99 
at this stage. As a result of the modification of Y 2 both S3 and S5 are inserted 
into the worklist (they reference the lattice value of ^ 2 )- S3 uses the propagated 
constant 99 to compute and propagate the constant 198 to element 1 of Di and 
then, after evaluation of S4, to element 1 of I? 2 - Any subsequent references to 
D in the then block of the source become references D 2 in the Array SSA form 
and are known to have a constant value at element 1. Depending on the order 
of computations via the worklist, we may then compute either ZI 3 and then D 4 
or, because D 2 has been modified, we may compute D^. If we compute at 
this point, it appears to have a constant value. Subsequent evaluations D 3 and 
D 4 cause D 5 to be reevaluated and lowered from constant value to _L because 
the value along one path is not constant. 

Notice that in this case, the reevaluation of could have been avoided by 
choosing an optimal ordering of processing. Processing of programs with cyclic 
control flow is no more complex but may involve recomputation that can not 
be removed by optimal reordering. In particular, the loop entry is a control flow 
merge point since control may enter from the top or come from the loop body. 
It will contain a (j) which combines the value entering from the top with that 
returning after the loop. The lattice values for such a node may require multiple 
evaluations. 

Also notice that in this example, if I in S5 is known to have the value 3, it 
will be recoded as a constant element. Upon evaluation of S7, the intersection 
of the sets associated with D 2 and D 4 will not be empty and element 1 of D 5 
will be recorded as a constant. 



y[3] := 99 
if C then 

D[l] := Y[3] * 2 
else 

D[l] ■- Y[I\ * 2 
endif 
Z D[l] 



Fig. 10. Sparse Constant Propagation Example 
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Yo and Do in effect here. 

SI: yi[3]:=99 

S2: Y2 ■.= cf){Yi,Yo) 

if C then 

S3: ^i[l] ■= ^ 2 ( 3 ] * 2 

S4: D 2 := 4>{Di, Do) 

else 

S5: ^sfl] ■= ^ 2 ( 4 ] * 2 

S6: D4 := (f){D3, Do) 

endif 

S7 : D 5 := (j>{D 2 , D 4 ) 

S8: Z ■- Ds)!] 



Fig. 11. Array SSA form for the Sparse Constant Propagation Example 



SI: £{Yi) = < (3,99) > 

S2: C{Y2) = Cd4.mYi),C{Yo)) 
S3:£(Hi) =£d[i(£.(£(y2[3]),2)) 
S4-. £{D2) ^ Cd^{C{D4),£(Do)) 
S5: £{D3) ^ £di]{£*mY2[I]),2)) 
S6: £{D4) ^ £d4£iD3),£{Do)) 
S7: £{D5) = £^{£(D2),£{D4)) 

S8: £{Z) ^ £{D5[1]) 



Fig. 12. Data Flow Equations for the Sparse Constant Propagation Example 



£(Ai) 


£{{) = T 


£{i) = Constant 


£{{) = A 


£{k) = T 


T 


T 


A 


£{k) = Constant 


T 


mk),£{i))) 


A 


£{k) = A 


A 


A 


A 



Fig. 13. Lattice computation for array definition operator, £{Ai) = £d[]{£{k) , £{i)) 



£{A[k]) 


hi 

II 

H 


£{k) — Constant 


£{k) = A 


£{A) = T 


T 


T 


A 


£{A) = ((ii,ei),...) 


T 


6j, if 3 {ij, 6j) G £(A) with ij = £{k) 
A, otherwise 


A 


£{A) = A 


A 


A 


A 



Fig. 14. Lattice computation for array reference operator, £{A[k]) = £[^{£{A), £{k)) 
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Initialization: 

C,(v) <— T for all local variables, v . 
insert{Ev,workJist) for each equation E-u defining v 
such that V is assigned to in the entry block. 



Body: 

while (workjist != empty) 

Ey <— removeiworkjist) 
reevaluate{Ev) 

insert(E'^, , workjist) for each equation E'^, that uses 
end while 



Fig. 15. Algorithm for Sparse Constant propagation (SC) for array and scalar variables 



4 Sparse Conditional Constant Propagation 

4.1 Lattice Values of Executable Flags for Nodes and Edges 

As in the SCC algorithm in |5], our array conditional constant propagation 
algorithm maintains executable flags associated with each node and each edge 
in the CFG. Flag indicates whether node may be executed, and 
indicates whether edge Ci may be traversed. The lattice value of an execution 
flag is either NO or maybe, corresponding to unreachable code and reachable 
code respectively. The lattice value for an execution flag is initialized to NO, and 
can be lowered to maybe in the course of the constant propagation algorithm. 
In practice, control dependence identities can be used to reduce the number of 
executable flag variables in the data flow equations e.g., a single flag can be 
used for all CFG nodes that are control equivalent. For the sake of simplicity, 
we ignore such optimizations in this paper. 

The executable flag of a node is computed from the executable flags of its 
incoming edges. The executable flag of an edge is computed from the executable 
flag of its source node and knowledge of the branch condition variable used to 
determine the execution path from that node. These executable flag mappings 
are summarized in Fig. El for a node n with two incoming edges, el and e2, 
and two outgoing edges, e3 and e4. The first function table in Fig. El defines 
the join operator □ on executable flags such that = X^i r\Xe 2 - We introduce 
a true operator Cq- and a false operator Cj: on lattice values such that X ^3 = 
C'j-{Xn,C{C)) and Xe 4 = Cjr{Xn, C{C)). Complete function tables for the Cj- 
and Cj: operators are also shown in Fig. El Note that all three function tables 
are monotonic with respect to their inputs. 

Other cases for mapping a node value to the Xf, values of its outgoing 
edges can be defined similarly. If n has exactly one outgoing edge e, then X^. = 
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Definition of join operator, = Xei n Xe 2 ‘- 



Xn 


Xe2 = NO 


Xe2 = MAYBE 


Xel = NO 


NO 


MAYBE 


Xei = MAYBE 


MAYBE 


MAYBE 



Definition of true operator for branch condition C, 

Xe3 = Cj-{X„,C{C)): 



Xe3 


£(C) = T 


£(C) = TRUE 


£(C) = FALSE 


£(C) = ± 


X„ = NO 


NO 


NO 


NO 


NO 


X„ = MAYBE 


NO 


MAYBE 


NO 


MAYBE 




Definition of false operator for branch condition C, 

Xe4 = rjr(X„,£(C7)): 



Xe4 


£(C) = T 


£(C) = TRUE 


£(C) = FALSE 


£(C) = ± 


X„ = NO 


NO 


NO 


NO 


NO 


X„ = MAYBE 


NO 


NO 


MAYBE 


MAYBE 



Fig. 16. Executable flag mappings for join operator (□), true operator (C-j-), and false 
operator 



£{k3) 


Xe2 = NO 


Xe2 = MAYBE 


Xel = NO 


T 


£(k2) 


Xel = MAYBE 


£(ki) 


£(ki)n£(k2) 



Fig. 17. fes := ${ki, Xf-i, k 2 , Xf. 2 ), where execution flags X^i and Xe2 control the 
selection of ki and k 2 respectively 
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Xn ■ If node n has more than two outgoing edges then the mapping for each edge 
is similar to the Cj- and Cjr operators. 

Recall that the C 4 , function for a control <P was defined in Sect. lb. 2 l b V the meet 
function, C${C{A 2 ) , C{Ai)) = C{Ai) UC{A 2 ). For the extended SCC algorithm 
described in this section, we use the definition of C<p shown in Fig. O This 
definition uses executable flags X^i and Xf. 2 , where el and e2 are the incoming 
control flow edges for the <P function. Thus, the lattice values of the executable 
flags Xei and X ^2 are used as compile-time approximations of @ variables @ki 
and @k 2 - 

4.2 Sparse Conditional Constant Propagation Algorithm 

We introduce our algorithm by first explaining how it works for acyclic scalar 
code. Consider the example program shown in Fig. El The basic blocks are la- 
beled nl, n2, n3 and n4. Edges el2, el3, e24 and e34 connect nodes in the obvious 
way. The control flow following block nl depends on the value of variable n. 

The first step is to transform the example in Fig. El to partial Array SSA 
form (with no @ variables) as shown in Fig. El Note that since k had multiple 
assignments in the original program, a (f> function is required to compute k^ as 
a function of k\ and k 2 - 

The second step is to use the partial Array SSA form to create a set of data 
flow equations on lattice values for use by our constant propagation algorithm. 
The conversion to equations is performed as follows. There is one equation 
created for each assignment statement in the program. There is one equation 
created for each node in the CFG. There is one equation created for each edge in 
the CFG. The equations for the assignments in our example are shown in Fig. EOl 
The equations for the nodes and edges in our example are found in Fig. M and 
E2 respectively. 

The lattice operations £<, £*, and Cmax use specific knowledge of their 
operation as well as the lattice values of their operands to compute resulting 
lattice operations. For example, £,(T,£(0)) results in £(0) because the result 
of multiplying 0 by any number is 0. 

Next, we employ a work list algorithm, shown in Fig. that reevaluates 
the equations until there are no further changes. A solution to the data flow 
equations identifies lattice values for each variable in the Array SSA form of the 
program, and for each node executable flag and edge executable flag in the CFG. 
Reevaluation of an equation associated with an assignment may cause equations 
associated with other assignments to be inserted on the work list. If the value 
appears in a conditional expression, it may cause one of the equations associated 
with edges to be inserted on the work list. Reevaluation of an edge’s executable 
flag may cause an equation for a destination node’s executable flag to be inserted 
on the work list. If reevaluation of a node’s executable flag indicates that the 
node may be evaluated, then the equations associated with assignments within 
that node to be added to the work list. When the algorithm terminates, the 
lattice values for variables identify the constants in the program, and the lattice 
values for executable flags of nodes identify unreachable code. 
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nl : 


i := 1 

C := i < n 
if C then 


n2: 


k := 2 * i 




else 


n3 : 


k := 2 * n 




endif 


n4 : 


print k 




Fig. 18. Acyclic Scalar Example 



nl : 


i := 1 

C := i < n 
if C then 


n2: 


k]_ 2 




else 


n3 : 


k2 2 




endif 


n4 : 


ki := (j}{ki,k2) 
print fcs 



Fig. 19. Partial Array SSA form for the Acyclic Scalar Example 



C{i)=C{l) 

C{C)=C<[C{i),L{n)) 

£(fci) =£*(£(2),£(i)) 

£(fc 2 ) =r*(£(2),£(n)) 

£(fc3) = r«,(£(fcl), Xe24, r(fc2), Ae34) 



Fig. 20. Equations for Assignments 



Xnl = TRUE 
Xn 2 ~ Ael2 



A'„3 = Ael3 

A„4 = Ae24 n Xe34 



Fig. 21. Equations for Nodes 
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XM = Cr^Xr,uC{c)) 
XM=Cjr{Xr.i^C(C)) 
A"e24 — Xn2 
Ae34 = Xri3 



Fig. 22. Equations for Edges 



Initialization: 

Civ) ■<— T for all local variables, v. 

Xn ^ MAYBE where Xn is the executable flag for the entry node. 

Xn •<— NO where Xn is the executable flag for any node other than the entry node. 

Xe NO where Xe is the executable flag for any edge. 

insert{Ev,workJist) for each equation defining v 
such that V is assigned to in the entry block. 



Body: 

while {workjist != empty) 

Ev <— remove{workJist) 
reevaluate{Ev) 

insert{El/ ,workJist) for each equation E'^/ that uses Ev 
if Ev defines the executable flag for some node n then 
insert(E'^/ , workjist) for each equation E[^, defining v' 
such that v' is assigned to in block n. 
end if 
end while 



Fig. 23. Sparse Conditional Constant propagation (SCC) algorithm for scalar and 
array variables 

Even though we assumed an acyclic CFG in the above discussion, the algo- 
rithm in Fig. 1231 can be used unchanged for performing constant propagation 
analysis on a CFG that may have cycles. The only difference is that the CFG 
may now contain back edges. Each back edge will be evaluated when its source 
node is modified. The evaluation of this back edge may result in the reevaluation 
of its target node. 

As in past work, it is easy to show that the algorithm must take at most 
0{Q) time, where Q is the number of data flow equations, assuming that the 
maximum arity of a function is constant and the maximum height of the lattice 
is constant. 

As an example with array variables. Fig. El lists the data flow equations 
for the assignment statements in the Array SSA program in Fig. E| (the data 
flow equations for nodes and edges follow the CFG structure as in Figs. El and 
E2J- Given the definition of lattice elements for array variables from Sect. 13 . 1 1 
the conditional constant propagation algorithm in Fig. 1221 can also be used 
unchanged for array variables. 
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C{Ao) = _L 
C{{) = £(1) 

C{C) = C<{C(i),C{n)) 

C(k) = £.(£(2),£(i)) 

£(kli) = £d[](r(fc),£(i)) 

C{A2) = Cd^[C{Ai),C{Ao)) 

C{Ai) = C^{C{A2),X,2a,C{Aq),Xm) 



Fig. 24. Equations for Assignments from Fig. El 



5 Related Work 

Static single assignment (SSA) form for scalar variables has been a significant 
advance. It has simplified the design of some optimizations and has made other 
optimizations more effective. The popularity of SSA form surged after an efficient 
algorithm for computing SSA form was made available 0. SSA form is now a 
standard representation used in modern optimizing compilers in both industry 
and academia. 

However, it has been widely recognized that SSA form is much less effective 
for array variables than for scalar variables. The approach recommended in ^ 
is to treat an entire array like a single scalar variable in SSA form. The most 
serious limitation of this approach is that it lacks precise data flow information 
on a per-element basis. Array SSA form addresses this limitation by providing 
functions that can combine array values on a per-element basis. The con- 
stant propagation algorithm described in this paper can propagate lattice values 
through <P functions in Array SSA form, just like any other operation/function 
in the input program. 

The problem of conditional constant propagation for scalar variables has been 
studied for several years. Wegbreit [Z| provided a general algorithm for solving 
data flow equations; his algorithm can be used to perform conditional constant 
propagation and more general combinations of program analyses. However, his 
algorithm was too slow to be practical for use on large programs. Wegman 
and Zadeck 0 introduced a Sparse Conditional Constant (SCC) propagation 
algorithm that is as precise as the conditional constant propagation obtained by 
Wegbreit ’s algorithm, but runs faster than Wegbreit ’s algorithm by a speedup 
factor that is at least 0{V), where V is the number of variables in the program. 
The improved efficiency of the SCC algorithm made it practical to perform 
conditional constant propagation on large programs, even in the context of 
industry-strength product compilers. The main limitation of the SCC algorithm 
is a conceptual one — the algorithm operates on two “worklists” (one containing 
edges in the SSA graph and another containing edges from the control flow 
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graph) rather than on data flow equations. The lack of data flow equations 
makes it hard to combine the algorithm in with other program analyses. The 
problem of combining different program analyses based on scalar SSA form has 
been addressed by Click and Cooper in where they present a framework 
for combining constant propagation, unreachable-code elimination, and value 
numbering that explicitly uses data flow equations. 

Of the conditional constant propagation algorithms mentioned above, our 
work is most closely related to that of |2] with two significant differences. First, 
our algorithm performs conditional constant propagation through both scalar 
and array references, while the algorithm in j2j is limited only to scalar variables. 
Second, the framework in |2I uses control flow predicates instead of execution 
flags. It wasn’t clear from the description in j2j how their framework deals with 
predicates that are logical combinations of multiple branch conditions; it appears 
that they must either allow the possibility of an arbitrary size predicate expres- 
sion appearing in a data flow equation (which would increase the worst-case 
execution time complexity of their algorithm) or they must sacrifice precision by 
working with an approximation of the predicate expression. 

6 Conclusions 

We have presented a new sparse conditional constant propagation algorithm for 
scalar and array references based on Array SSA form 0. Array SSA form has 
two advantages: It is designed to support analysis of arrays at the element level 
and it employs a new <P function that is a pure function of its operands, and 
can be manipulated by the compiler just like any other operator in the input 
program. 

The original sparse conditional constant propagation algorithm in [3 dealt 
with control flow and data flow separately by maintaining two distinct work 
lists. Our algorithm uses a single set of data flow equations and is therefore 
conceptually simpler. In addition to being simpler, the algorithm presented in 
this paper is more powerful than its predecessors in that it handles constant 
propagation through array elements. It is also more effective because its use 
of data flow equations allows it to be totally integrated with other data flow 
algorithms, thus making it easier to combine other analyses with conditional 
constant propagation. 
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Abstract. This paper describes an empirical comparison of four context- 
insensitive pointer alias analysis algorithms that use varying degrees of 
flow-sensitivity: a flow-insensitive algorithm that tracks variables whose 
addresses were taken and stored; a flow-insensitive algorithm that com- 
pntes a solution for each function; a variant of this algorithm that uses 
precomputed kill information; and a flow-sensitive algorithm. In addition 
to contrasting the precision and efficiency of these analyses, we describe 
implementation techniques and quantify their analysis-time speed-up. 



1 Introduction 

To effectively analyze programs written in languages that make extensive use 
of pointers, such as C, C-I-+, or Java (in the form of references), knowledge of 
pointer behavior is required. Without such knowledge, conservative assumptions 
about pointer values must be made, resulting in less precise data flow infor- 
mation, which can adversely affect the effectiveness of analyses and tools that 
depend on this information. 

A pointer alias analysis is a compile-time analysis that, for each program 
point, attempts to determine what a pointer can point to. As such an analysis 
is, in general, undecidable l,'S4j . approximation methods have been developed. 
These algorithms provide trade-offs between the efficiency of the analysis and 
the precision of the computed solution. The goal of this work is to quantify how 
the use of flow-sensitivity affects precision and efficiency. 

Although several researchers have provided empirical results of their tech- 
niques, comparisons among algorithms can be difficult because of differing pro- 
gram representations, benchmark suites, and metrics. By holding these factors 
constant, we can focus more on the efficacy of the algorithms and less on the 
manner in which the results were obtained. 

The contributions of this paper are the following: 

— empirical results that measure the precision and efficiency of four pointer 
alias analysis algorithms with varying degrees of flow-sensitivity; 

* This work was supported in part by the National Science Foundation under grant 
CCR-9633010, by IBM Research, and by SUNY at New Paltz Research and Creative 
Project Awards. 
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— empirical evidence of how various implementation enhancements significantly 
improved analysis time of the flow-sensitive analysis. 

In addition to the use of fiow-sensitivity, other factors that affect the cost / pre- 
cision trade-offs of pointer alias analyses include the use of context-sensitivity 
and the manner in which aggregates (arrays and structs) and the heap are mod- 
eled. Our experiments hold these factors constant so that the results only reflect 
the usage of fiow-sensitivity. 

Section 121 highlights the four algorithms and their implementations. Section 0 
describes the empirical study of the four algorithms, analyzes the results, and 
contrasts them with related results from other researchers. Section 0 overviews 
some of the performance-improving enhancements made in the implementation 
and quantifies their analysis-time speed-up. Section 0 describes other related 
work. Sectional states conclusions. 

2 Analyses and Implementation 

One manner of classifying interprocedural data flow analyses is whether they 
consider control flow information during the analysis. A flow-sensitive analysis 
considers control flow information of a procedure during its analysis of the pro- 
cedure. A flow-insensitive analysis does not consider control flow information 
during its analysis, and thus can be more efficient, but less precise. (See for 
a full discussion of these definitions.) 

The algorithms we consider, listed in order of increasing precision, are 

AT: a flow-insensitive algorithm that computes one solution set for the entire 
program that contains all named objects whose address has been taken and 
stored, 

FI: a flow-insensitive algorithm H E] that computes a solution set for every 
function, 

FIK: a flow-insensitive algorithm 00 that computes a solution set for every 
function, but attempts to improve precision by using precomputed (flow- 
sensitive) kill information, 

FS: a flow-sensitive algorithm 00 that computes a solution set for every pro- 
gram point. 

The following sections provide further information about these analyses and their 
implementation. 

2.1 Algorithms 

The program is represented as a program call (multi-) graph (PCG), in which a 
node corresponds to a function, and a directed edge represents a call to the target 
function0Each function body is represented by a control flow graph (CFG) . This 

^ Potential calls can occur due to function pointers and virtual methods, in which the 
called function is not known until run time. 
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graph is used to build a simplified sparse evaluation graph (SEG) 0, discussed 
in Sectional 

The address-taken analysis (AT) coimutes its solution by making a single 
pass over all functions in the programp adding to a global set all variables 
whose addresses have been assigned to another variable. These include actual 
parameters whose addresses are stored in the corresponding formal. Examples 
are statements such as “p = &a;”, “q = new and “foo(fea) but not 

simple expression statements such as “&a;” because the address was not stored. 
AT is efficient because it is linear in the size of the program and uses a single 
solution set, but it can be very imprecise. It is provided as a base case for 
comparison to the other three algorithms presented in this paper. 

The general manner in which the other three analyses compute their solu- 
tions is the same. A nested fixed point computation is used in which the outer 
nest corresponds to computing solutions for each function in the PCG. Each 
such function computation triggers the computation of a local solution for all 
program points that are distinguished in the particular analysis. For the flow- 
sensitive (FS) analysis, the local solution corresponds to each GFG node in the 
function. For the other two flow-insensitive analyses (FI, FIK), the local solution 
corresponds to one set that conservatively represents what can hold anywhere 
in the function. This general framework is presented as an iterative algorithm 
in Fig. ^ and is further described in |^. An extension to handle virtual methods 
is described in . Section ^ reports improvements due to the use of a worklist- 
based implementation. 



Si: build the initial PCG 

S2: foreach procedure, p, in the PCG, loop 

S3: initialize interprocedural alias sets of p to {} 

S4: end loop 

Ss: repeat 

S3: foreach procedure, p, in the PCG, loop 

S7: using the interprocedural alias sets (for entry of p and call sites in p), 

compute the intraprocedural alias sets of p 
Ss: using the intra procedural alias sets of p, 

update the interprocedural alias sets representing 
the effect of p on each procedure that calls or is called by p 
Sg: end loop 

S'lo: using new function pointer aliases, update the PCG, 

initializing interprocedural alias sets of new functions to {} 

S'!!: until the interprocedural alias sets and the PCG converge 

Fig. 1. High-level description of general algorithm 



As the PCG is not used, this can include functions that are not called. 
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The FS, FI, and FIK analyses utilize the compact representation [3 E] to 
represent alias relations. This representation shares the property of the points-to 
representation in that it captures the “edge” information of alias relations. 
For example, if variable a points to b, which in turn points to c, the compact 
representation records only the following alias set: {{*a,b), (*b,c)}, from which 
it can be inferred that (** a, c) and (** a, *b) are also aliases|3 

All analyses are context-insensitive] they merge information flowing from 
different calls to the same function, and may suffer from the unrealizable path 
problem 1221, i.e., they potentially propagate back to the wrong caller the aliases 
of the called function. fSections Id.til a.nd 14.41 discuss this potential imprecision.) 
Context-sensitive analyses EH] do not suffer from this problem, but may 
increase time/space costs. 

As in I21I2I, all analyses considered here represent the (possibly many) ob- 
jects allocated at calls to new or malloc by creating a named object based on 
the CFG node number of the allocation statement. These objects are referred to 
as heapn, where n is the CFG node number of the allocation statement. These 
names are unique throughout the entire program. More precise heap modeling 
schemes [21 121 III [71 0 im iiniiTni can improve precision, but may also 

increase time/space costs. Quantifying the effects of using context-sensitivity 
and various heap models is beyond the scope of this work. 

Consider the simple program in Fig. |3 The AT analysis computes only one 
set of objects, which it assumes all pointers may point to. This set will contain 
five objects {heapsi-, heapss, heapsA, heapse, and heapsr}, all of which will 
appear to be referenced at 58. 





T *p, *q; 

void main(void) { 


54: 


void f(void) { 
p = new T; 


void g(void) { 


51: 


p = new T; 


55: 


g(); 


58: T t = *p; 


52: 


f(); 


56: 


p = new T; 




53: 


p = new T; 


57: 


q = new T; 


} 




} 


Fig. 2. 


} 

Example program 





The FI analysis does not use any intraprocedural control flow information. 
Instead it conservatively computes what can hold anywhere within a function, 
i.e., for each function, /, it uses only one alias set. Holds j, to represent what 
may hold at any CFG node in /. Thus, in Fig. |2|the FI analysis assumes that 
(*p, heapsi) and (*p, heapss) can flow into f . This results in Holdsg = Holds f = 
{{*p,heapsi), i*p,heaps3), {*p,heaps4), {*p, heapse), {*q, heapsr)}, which re- 



® See [.'-{( )l lH] for a discussion of precision trade-offs between this representation and an 
explicit representation, which would contain all four alias pairs. 
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suits in four o^ects, heapsi, heapss, heapsA, and heapse potentially being ref- 
erenced at 5'8 q 

The FIK analysis attempts to improve the precision of the flow-insensitive 
analysis by precomputing kill information for pointers, and then uses this infor- 
mation during the flow-insensitive analysis at call sites 0 For example, the pre- 
computation will determine that all alias relations involving *p on entry to f will 
be killed before the call to g at S'5. Thus, Holdsg will contain only the alias rela- 
tions that are generated in f and propagated to g, i.e., Holdsg = {{*p, heapsA), 
(*p,heapse) {*<1^ heapsr)}- This results in two objects, heapsA and heapse, po- 
tentially being referenced at S8. 

The FS analysis associates an alias set before (/n„) and after (Out„) every 
CFG node, n. For example, Outsi = {{*P, heapsi)} because *p and heapsi refer 
to the same storage. At the entry to function g, the FS analysis will compute 
Inss = {{*P, heapsA)}, which is the precise solution for this simple example. 

This example illustrates the theoretical precision levels of the four analyses, 
from FS (most precise) to AT (least precise). The AT analysis is our most efficient 
analysis because it is linear and only uses one set. The FI analysis is more efficient 
than the FIK analysis because it neither precomputes kill information nor uses 
it during the analysis. One would expect the FS analysis to be the least efficient 
because it needs to distinguish solutions for every point in the program. Thus, 
a theoretical spectrum exists in terms of precision and efficiency with the AT 
analysis on the less precise/more efficient side, the FS analysis on the more 
precise/less efficient side, and the FI and FIK analyses in the middle. 

Not studied in this paper are other flow-insensitive analyses [El 03 that use 
one alias set for the whole program and limit the number of alias relations by 
sometimes grouping distinct variables into one named object. These analysis fall 
in between AT and FI in the theoretical precision spectrum. 

2.2 Implementation 

The analyses have been implemented in the NPIC system, an experimental pro- 
gram analysis system written in C-| — h. The system uses multiple and virtual 
inheritance to provide an extensible framework for data flow analyses [Li 1 1 l.t.tj . 
A prototype version of the IBM VisualAge C-|— I- compiler USES is used as the 
front end. The abstract syntax tree constructed by the front end is transformed 
into a PCG and a GFG for each function, which serve as input to the alias anal- 
yses. No GFG is built for library functions. We model a call to a library function 
based on its semantics, thereby providing the benefits of context-sensitive analy- 
sis of such calls. Library calls that cannot affect the value of a pointer are treated 
as the identity transfer function. 

^ Although a final intraprocedural flow-sensitive pass can be used to improve preci- 
sion [3, this pass has not been implemented. 

® Kill information is computed in a single flow-sensitive prepass of each CFG. For each 
call site, c, we compute two sets, the set of pointers that are definitely killed on all 
paths from entry to c and the set of pointers that are definitely killed on all paths 
from c to exit E1I3 Only the first set is used in our example. 
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The FS, FI, and FIK analyses are implemented using worklists. ISeetinn 14.21 
discusses an earlier iterative implementation.) These three analyses incorpo- 
rate function pointer analysis into the pointer alias analysis as described in 
giini. Currently, array elements and field components are not distinguished, and 
set jmp/longjmpfl statements are not supported. The implementation also as- 
sumes that pointer values will only exist in pointer variables, and that pointer 
arithmetic does not result in the pointer going beyond array boundaries. As 
stated in Section o heap objects are named based on their allocation site. 

To model the values passed as argc and argv to the main function, a dummy 
main function was added, which called the benchmark’s main function, simulat- 
ing the effects of argc and argv. This function also initialized the _iob array, used 
for standard I/O. The added function is similar to the one added by Ruf 
and Landi et al. EHl Initializations of global variables are automatically 
modeled as assignment statements in the dummy main function. 



3 Results 

Our benchmark suite contains 21 C programs, 18 provided by other researchers 
Ea dEni and 3 from the SPEC CINT92 121 and CINT95 [44j benchmarks^ 
Table d describes characteristics of the suite. The third column contains the 
number of lines in the source and header files reported by the Unix utility wc. 
The fourth column reports the number of user-defined functions (nodes in the 
PCG), which include the dummy main function. The next two columns give 
the number of call sites, distinguished between user and library function calls. 
The next two columns report cumulative statistics for all CFG nodes and edges. 
These figures include nodes and edges created by the initialization of globals. The 
following column computes the ratio of CFG edges to nodes. The next column 
reports the percentage of CFG nodes that are considered pointer-assignment 
nodes. The current analysis treats an assignment as a pointer-assignment if the 
variable involved in the pointer expression on the left side of the assignment is 
declared to be a pointer |f| The last two columns report the number of recursive 
functions (functions that are in PCG cycles) and heap allocation sites in each 
program. The last row of the table reports the average edge/node ratio and the 

® Although one program in our benchmark suite, anagram, does syntactically contain 
a call to longjmp, the code is unreachable. 

^ Some programs had to be syntactically modified to satisfy C-|— l-’s stricter type check- 
ing semantics. A few program names are different than those reported by Ruf [.'-itij . 
The SPEC CINT92 program 052 . alvinn was named backprop in Todd Austin’s 
benchmark suite P|. Ruf referred to ks as part, and ft as span m- 
® This is more conservative than considering statements in which the left side ex- 
pression is a pointer. Thus, statements such as “p->field = ...” are treated as 
pointer assignments no matter how the type of field is declared. A more accu- 
rate categorization would not affect the precision of the analysis, but could improve 
the efficiency by reducing the number of nodes considered during the analysis as 
discussed in Section O 
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average pointer-assignment node percentage, both of which are computed by 
averaging the corresponding values over the 21 benchmarks. 



Table 1. Static characteristics of benchmark suite 



Name 


Source 


LOG 


Funcs 


Call Sites 


CFG 


Ptr-Asgn 

Nodes 


Rec 

Funcs 


Allocation 

Sites 


User 


Lib 


Nodes 


Edges 


hjcLqes 


allroots 


Landi 


227 


7 


19 


35 


157 


167 


1.064 


2.6% 


2 


1 


052.alvinn 


SPEC92 


272 


9 


8 


13 


223 


243 


1.090 


11.2% 


0 


0 


Ol.qbsort 


McCat 


325 


8 


9 


25 


173 


187 


1.081 


24.9% 


1 


5 


15. trie 


McCat 


358 


13 


19 


21 


167 


181 


1.084 


23.4% 


3 


5 


04. bisect 


McCat 


463 


9 


11 


18 


175 


189 


1.080 




0 


2 


IT.bintr 


McCat 


496 


17 


27 


28 


196 


205 


1.046 


9l% 


5 


1 


anagram 


Austin 


650 


16 


22 


38 


332 


365 


1.099 


9l)% 


1 


2 


lex315 


Landi 


733 


17 


102 


52 


527 


615 


1.167 


TEW 


0 


3 


ks 


Austin 


782 


14 


17 


67 


513 


576 


1.123 


26.9% 


0 


5 


05.eks 


McCat 


1,202 


30 


62 


49 


678 


732 


1.080 


3.8% 


0 


3 


08. main 


McCat 


1,206 


41 


68 


53 


684 


700 


1.023 


24.3% 


3 


8 


O9.vor 


McCat 


1,406 


52 


174 


28 


876 


917 


1.047 


27.5% 


5 


8 


loader 


Landi 


1,539 


30 


79 


102 


687 


773 


1.125 


S%% 


2 


7 


129. compress 


SPEC95 


1,934 


25 


35 


28 


596 


652 


1.094 


6A% 


0 


0 


ft 


Austin 


2,156 


38 


63 


55 


732 


808 


1.104 


19.5% 


0 


5 


football 


Landi 


2,354 


58 


257 


274 


2,710 


3,164 


1.168 


rw% 


1 


0 


compiler 


Landi 


2,360 


40 


349 


107 


1,723 


2,090 


1.213 


1.1% 


14 


0 


assembler 


Landi 


3,446 


52 


247 


243 


1,544 


1,738 


1.126 


8.0% 


0 


16 


yacr2 


Austin 


3,979 


59 


158 


169 


2,030 


2,328 


1.147 


Ea% 


5 


26 


simulator 


Landi 


4,639 


111 


447 


226 


2,686 


3,123 


1.163 


Tf% 


0 


4 


099. go 


SPEC95 


29,637 


373 


2,054 


22 


16,823 


20,176 


1.199 


1% 


1 


0 


Average 




1.111 


10.9% 





3.1 Description of Experiment 

This section presents precision and efficiency results. For each benchmark and 
each analysis, we report the analysis time, the maximum memory used, and 
the average number of objects the analysis computes a dereferenced pointer can 
point to. The precision results for the FIK analysis are exactly the same as the 
FI analysis for all benchmarks. Thus, we do not explicitly include this analysis 
in our precision data. 

The experiment was performed on a 132MHz IBM RS/6000 PowerPC 604 
with 96MB RAM and 104MB paging space, running AIX 4.1.5. The executable 
was built with IBM’s xlC compiler using the “-03” option. 

The analysis time is reported in seconds and does not include the time to 
build the PCG and CFGs, but does include any analysis-specific preprocessing, 
such as building the SEG from the CFG. This information is displayed in the top 
left chart in Fig.El The top right chart of this figure reports the high-water mark 
of memory usage during the analysis process. This memory usage includes the 
intermediate representation, the alias information, and statistics-related data. 
This information was obtained by using the “ps v” command under AIX 4.1.5. 

To collect precision information, the system traverses the representation vis- 
iting each expression containing a pointer dereference and, using the computed 
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Analysis Time (Secs) 




allroots 
052.alvinn 
Ol.qbsort 
IS.trie 
04.bisect 
17.bintr 
anagram 
lex315 
ks 
OS.eks 
08. main 
09.vor 
loader 
129.compress 
ft 

football 

compiler 

assembler 

yacr2 

simulator 

O99.go 



Max Memory Usage (MB) 



i. 



0 



Flow Sensitive Flow Insensitive/Kill Flow Insensitive 



Address Taken 



Reads Precision (Num Objects) 



Writes Precision (Num Objects) 





Flow Sensitive 



Flow Insensitive 



Address Taken 



Fig. 3. Analysis time, memory usage, and precision results 
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alias information, reports how many named objects are aliased to the pointer 
expression. We report the average number of such dereferences for both reads 
and writes. Further precision information is provided in m- 

A pointer expression with multiple dereferences, such as * * *p, is counted 
as multiple dereference expressions, one for each dereference. The intermediate 
dereferences (*p and **p) are counted as reads. The last dereference (***p) is 
counted as a read or write depending on the context of the expression. Statements 
such as (*p)++ and *p += increment are treated as both a read and a write of 

*p. 

We consider a pointer to be dereferenced if the variable is declared as a 
pointer or an array formal parameter, and one or more of the or 

“[ ]” operators are used with that variable. Formal parameter arrays are included 
because their corresponding actual parameter(s) could be a pointer. We do not 
count the use of the “[ ]” operator on arrays that are not formal parameters 
because the resulting “pointer” (the array name) is constant, and therefore, 
counting it may skew results. Figure El classifies the type of pointer dereferenced 
averaged over all programs. Information for each benchmark is given in 

100 
80 

e 60 

0 ) 

l« 

20 



Reads Writes 

Fig. 4. Classification of dereferenced pointer types for all programs 




61.1 



38.9 



22.8 



■ 



46.6 



18.9 



34.5 



^Formal 
Q Global 
^Local 



The manner in which the heap is modeled must be considered in evaluating 
precision results. For example, a model that uses several names for objects in the 
heap may seem less precise when compared to a model that uses fewer names [46] . 
Similarly, analyses that represent invisible objects (objects not lexically visible 
in the current procedure) aliased to a formal parameteras a single objeciH may 
report fewer objects. Our analyses do not use this technique. 

Assuming a correct input program, each pointer dereference should corre- 
spond to at least one object at run time, and thus one serves as a lower bound for 
the average. Although a precision result close to one demonstrates the analysis is 
precise (modulo heap and invisible object naming), a larger number could reflect 
an imprecise algorithm, a limitation of static analysis, or a pointer dereference 
that corresponds to different memory locations over the program’s execution. 



® This modeling technique can increase the possibility of strong updates |Z|. 
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Flow-Sensitive — Reads Flow-Sensitive — Writes 
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The bars for 099. go are truncated. The object type breakdown for both FS and FI is 
Nonvisible Formal 

Local Local Global Parameter Heap 

Reads O 9T T2 Ol 0 

Writes 0.0 8.1 5.4 0.01 0 



Fig. 5. Breakdown of average object type pointed to by a dereferenced pointer 
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The bottom two charts of Fig. 0 provide a graphical layout of precision in- 
formation for reads and writes. Fig. 0 refines this information for the FS and 
FI analyses by providing a breakdown of the type of object pointed to. Fig. El 
provides similar information averaged over all programs. Charts E and F of this 
figure report the percentage of dereferenced pointers that resolve to exactly one 
object in our model. If the object is a named variable, as opposed to a heap 
object, the pointer dereference could be replaced with the variable. This infor- 
mation is expanded upon in m- 

3.2 Discussion 

As expected, the results from the analysis speed chart of Fig. 0 indicate that 
the AT analysis is efficient; it takes less than .4 seconds on all programs. The FI 
analysis can be over twice as fast as the FS analysis, and was faster than the FS 
analysis in all but one program. 

The precision of the AT analysis leaves much to be desired. Fig. 0 reports 
on average 111.9 objects for reads and 96.68 objects for writes were in the AT 
set0 As one would expect this set to increase with the size of the program, the 
precision for this analysis will worsen with larger programs. 

The results also indicate that the FIK analysis is not beneficial. On our 
benchmark suite it is never more precise than the FI analysis, and on some 
occasions requires more analysis time than the FS analysis. One explanation for 
the precision result may be that an alias relation created to simulate a reference 
parameter, in which the formal points to the actual, typically is not killed in 
the called routine, i.e., the formal parameter is not modified, but rather is used 
to access the passed actual. Thus, programs containing these alias relations will 
not benefit from the precomputed kill information. 

One surprising result is the overall precision of the FI analysis. In 12 of the 21 
benchmarks the FI analysis is as precise as the FS analysis. This seems to suggest 
that the added precision obtained by the FS analysis in considering control flow 
within a function is not significant for those benchmarks, at least where pointers 
are dereferenced. We offer two possible explanations: 

1. Pointer variables are often not assigned more than one distinguished ob- 
ject within the same function. Thus, distinguishing program points within 
a function, a key difference between the FS and FI analyses, does not often 
result in an increase in precision. We have seen exceptions to this in the 
function InitLists of the ks benchmark and in the function InsertPoint 
in the 08. main benchmark. In both cases the same pointer is reused in two 
list-traversing loops. 

2. It seems that a large number of alias relations are created at call sites because 
of actual/formal parameter bindings. The lack of a substantial precision 
difference between our FS and FI analyses may be because both algorithms 
rely on the same (context-insensitive) mapping mechanism at call sites. 



10 



The numbers differ because they are weighted with the number of reads and writes 
through a pointer in each program. 
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Total Precision Reads 




(A) (B) 



Total Precision Writes 




Total Percentage of Resolved Dereferences to One Object 



Reads Writes 




Local Nonvisible Local Global Formal Parameter Heap 



Fig. 6. Charts A - D provide the average precision over all benchmarks for reads 
and writes. (Charts B and D do not include the AT analysis to allow the difference 
between the FS and FI analyses to be visible.) Charts E and F report the percentage 
of dereferenced pointers that resolve to one object in our model. 
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Considering charts B and D of Fig. 0 it seems that FI is as precise as FS 
for pointers directed to nonvisible locals and formal parameters. Therefore, FS, 
if employed at all, should focus on pointers directed to the heap and global 
variables. 

The precision results for 099. go merit discussion. An average of 17.03 and 
13.64 objects are returned for reads and writes, respectively, with a maximum of 
100. This program contains six small list-processing functions (using an array- 
based “cursor” implementation) that accept a pointer to the head of a list as 
a parameter. One of these functions, addlist, is called 404 times, passing the 
address of 100 different actuals for the list header, resulting in 100 aliases for the 
formal parameter. However, because the lifetime of the formal is limited to this 
function (it does not call any other function), these relations are not propagated 
to any other function. Therefore, these relations do not suffer the effects of the 
unrealizable path problem mentioned in Section l2. 1 1 

Another conclusion from the results is that analysis time is not only a function 
of program size; it also depends on the amount of alias relation propagation along 
the PCG and SEGs. For example, 099. go, despite being our largest program 
and having a pointer aliased with 100 objects, is analyzed at one of the fastest 
rates (3,037 LOG/second, 1,724 GFG nodes/second) because these relations are 
not propagated throughout the program. 

As suggested by Shapiro and Horwitz 3.nd Diwan m a more precise 
and time-consuming alias analysis may not be as inefficient as it may appear 
because the time required to obtain increased precision may reduce the time re- 
quired by subsequent analyses that utilize mod-use information, and thus pointer 
alias information, as their input. As the previous paragraph suggests, this can 
also be true about pointer alias analysis itself, which also utilizes pointer alias 
information during its analysis. 

3.3 Comparison with Other Results 

Landi et al. m report precision results for the computation of the MOD prob- 
lem using a flow-sensitive pointer alias algorithm with limited context-sensitive 
information. Among the metrics they report is the number of “thru-deref” as- 
signs, which corresponds to the “write” metrics reported in Fig. 0 However, 
since their results include compiler-introduced temporaries in their “thru-deref” 
count I2S1, a direct comparison is not possible. 

Stocks et al. m use the same metric without including temporaries for the 
flow-sensitive context-sensitive analysis of Landi and Ryder |2Z|. They report 
the average number of objects ranges from 1.0 to 2.0 on the eight common 
benchmarks. On these benchmarks our flow-sensitive context-insensitive analysis 
ranges from 1.0 to 2.22. Two possible explanations for the slightly less precise 
results are 1) their algorithm uses some context-sensitivity; 2) the underlying 
representation is not identical, and thus pointer dereferences may not be counted 
in the same manner in all cases. For example, statements such as cfree(TP) 
located in allroots are treated as modifying the structure deallocated, and thus 
as a pointer dereference m- In fact, on the three programs in which our analysis 
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reports the same, or close to the same, number of “writes” as “thru-derefs” 
(allroots, lex315, simulator), our precision is identical to that reported in 

The relative precision of the flow-insensitive analysis compared to the flow- 
sensitive analysis is in contrast to the study of Stocks et al. ini, which compares 
the flow-sensitive analysis of Landi and Ryder m with a flow-insensitive anal- 
ysis described in m- For the eight common benchmarks, our flow-insensitive 
algorithm ranges from 1.0 to 2.81 objects on average for a write dereference, com- 
pared to 1.0 to approximately 6.3 for the flow-insensitive analysis they studied. 
The analysis described in m shares the property of Steensgaard’s analysis m 
in that it groups all objects pointed-to by a variable into an equivalence class. 
Although this can lead to spurious alias relations not present in the FI analysis, 
it does allow for an almost linear time algorithm, which has been shown to be 
fast in practice 

Emami et al. report precision results for a flow-sensitive context-sensitive 
algorithm. Their results range from 1.0 to 1.77 for all indirect accesses using a 
heap naming scheme that represents all heap objects with one name. Because 
we were unable to obtain the benchmarks from their suite, a direct comparison 
with our results is not possible. 

Ruf gOj reports both read and write totals for a flow-sensitive context- 
insensitive analysis. However, unlike our analysis he counts use of the “[ ]” oper- 
ator on arrays that are not formal parameters as a dereference m Since such 
an array will always point to the same place, the average number of objects is 
improved For the 11 benchmarks in common Ruf reports an overall read 
and write average of 1.33 and 1.38, respectively. To facilitate comparisons, we 
have also counted in this manner. The results for the common benchmarks are 
averages of 1.35 and 1.47 for the FS analysis and 1.41 and 1.54 for the FI anal- 
ysis. We attribute the slight differences in the FS analysis to the difference in 
representations. As Ruf states, “the VDG intermediate representation often 
coalesces series of structure or array operations into a single memory write.” 
This coalescing can skew results in either direction. 

Shapiro and Horwitz m present an empirical comparison of four flow- 
insensitive algorithms. The first algorithm is the same as the AT algorithm. 
The remaining three algorithms gma ca n be less precise and more efficient 
than the algorithms studied in this paper|3 The authors measure the precision 

The best illustration of this is in 099. go, which has a large number of array ref- 
erences, but a low number of pointer dereferences. In this program, the average 
changed from 17.03 to 1.13 for reads and from 13.64 to 1.48 for writes when all uses 
of the “[ ]” operator were counted. 

Although m reports results for ft (under the name span), our version of the 
benchmark is substantially larger than the one Ruf analyzed, and thus is not com- 
parable. 

In theory, the FI analysis can be more precise than Andersen’s algorithm P] because 
it considers function scopes, at the cost of using more than one alias set. However, 
both algorithms are likely to offer similar precision in practice because the distin- 
guishing case is probably uncommon. 
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of these analyses by implementing three dataflow analyses and an interprocedu- 
ral slicing algorithm. In addition to these alias analysis clients, the authors also 
report the direct precision of the alias analysis algorithms in terms of the total 
number of points-to relations. We agree with m Eni that a more meaningful 
metric is to measure where the points-to information is used, such as where a 
pointer is dereferenced. They conclude 1) a more precise flow-insensitive analysis 
generally leads to increased precision by the subsequent analyses that use this 
information with varying magnitudes; 2) metrics measuring the alias analysis 
precision tend to be good predictors on the precision of subsequent analyses 
that use alias information; and 3) more precise flow-insensitive analysis can also 
improve the efficiency of subsequent analyses that use this information. 

4 Efficiency Improvements 

This section describes some of the performance-improving enhancements made 
in the implementation and quantifies their effects on analysis-time speed-up for 
the flow-sensitive algorithm. Although novelty is not claimed, the efficacy of each 
technique is shown. 





Fig. 7. Example CFG (left) and procedure P (right). Note that g is a global, a,b,l 
are locals of the CFG, and f is a formal parameter of P. Possible topological numbers 
(ignoring the back edge 7 — > 2) appear next to each node. 



4.1 Sharing Alias Sets 

As described in Section rz. II the flow-sensitive analysis computes solutions before 
and after every node in the CFG. However, this can result in storing redundant 
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information. For example, node in Fig. Q does not affect any pointer value; 
therefore its Out set will always equal its In set. Likewise, a node whose prede- 
cessor(s) Out set(s) have the same value will have an In set equal to this value. 
For example, all nodes in Fig. Q except #1 and #2. Thus, for all nodes, except 
#1 and #2, alias sets can be shared. We use the term shared in a literal way 
— if the In set is shared with the Out set, they are the same objeet during the 
analysis. This is done with a shallow copy of the alias set data structures. We 
precompute these sharing sites in a separate forward pass over the CFG before 
performing the alias analysis. Each node that has its own set, which we call 
a deep set, is dubbed a “SEG” {Sparse Evaluation Graph) nodeQ Such nodes 
have a list of “SEG” predecessors and “SEG” successors that are used during 
the analysis. The alias set allocation strategy for a node, n, is summarized as 
follows: 



J Outp, if Vp, q G Pred{n), p and q share Out sets 
^ a deep set, otherwise 



^ ^ r Inn, if n’s transfer function is the identity function 

" [a deep set, otherwise 

Our current implementation treats all call nodes as SEG nodes. 

Table 0 reports the number of alias sets before and after this optimization 
was applied as well as the percentage reduction. In addition to saving storage (on 
average 73.78% fewer alias sets were allocated), this method saves visits during 
the actual analysis to nodes that can not affect pointer aliasing. Although we 
allocate fewer alias sets, the real benefit of this technique is seen during the 
analysis. Since we do not visit and propagate alias relations between extraneous 
CFG nodes, we simply have fewer alias relations being stored and copied from 
one CFG node to the next. This affects both the analysis’s time and space use 
in a significant way. 

Table0shows the effects of the efficiency improving ideas on the flow-sensitive 
analysis. Run times were collected as in Section^ using the C function clockO, 
which gives the CPU time, not the elapsed time. This metric was chosen because 
it eliminates the effects of system load, amount of RAM vs. paging space, etc. 
The elapsed time for each program was approximately 2.5 times the CPU time. 

The headings for each column are read vertically — for example, the second 
column shows the analysis time in seconds with no sharing of alias sets as well 
as no other enhancements described in this section. The third column reports 
the analysis time and resulting speed-up using this sharing technique. The effec- 
tiveness of this technique is mostly related to the percentage of CFG nodes that 
affect a pointer and percentage of call nodes — the higher the percentages the 
lower the potential benefitO For all our benchmarks, these averages were 10.9% 

The method described is a conservative approximation to the SEG of 0 and is 
similar to 

The percentage of merge nodes plays a smaller role. 
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Table 2. Effectiveness of sharing alias sets 



Benchmark 


Num. Alias Sets 


No Sharing 


Sharing 


Pet Saved 


allroots 


328 


87 


73.48% 


052.alvinn 


464 


89 


80.82% 


Ol.qbsort 


362 


111 


69.34% 


15. trie 


360 


125 


65.28% 


04. bisect 


352 


78 


77.84% 


17.bintr 


350 


124 


64.57% 


anagram 


696 


159 


77.16% 


lex315 


1088 


260 


76.10% 


ks 


1054 


317 


69.92% 


05.eks 


694 


170 


75.50% 


08. main 


1166 


399 


65.78% 


OO.vor 


1842 


633 


65.64% 


loader 


1434 


382 


73.36% 


129. compress 


1152 


195 


83.07% 


ft 


1140 


380 


66.67% 


football 


5536 


962 


82.62% 


compiler 


3486 


699 


79.95% 


assembler 


3184 


911 


71.39% 


yacr2 


3960 


782 


80.25% 


simulator 


5180 


1203 


76.78% 


099. go 


— 


4966 


— 


Average 




73.78% 



and 20.8% respectively. (O9.vor had high averages in these categories and has 
the lowest effectiveness for this technique while yacr2 had lower than average 
percentages and resulted in a higher speed-up.) 

Over all benchmarks, this technique results in a significant speed-up, 2.80 on 
average. For our largest benchmark, 099. go, the analysis did not run (due to 
insufficient memory) until we applied this optimization. 

4.2 Worklists 

The initial implementation of the analyses used an iterative algorithm for sim- 
plicity. After correctness was verified, a worklist-based implementation was used 
to improve efficiency. Two types of worklists were used: SEG node worklists and 
function worklists. Each function has a worklist of SEG nodes. The PGG has 
two worklists of functions: “current” and “next.” (The motivation for using two 
worklists will be described in Section lO) 

We use nested “while not empty” loops with the worklists — the outer loop 
visiting functions and the inner loop visiting SEG nodes. The worklist of func- 
tions initially contains all functions. On the first visit to a function, we initialize 
the function’s SEG node worklist with all SEG nodes in that function. If a SEG 
node’s Out set changes, all its SEG successors are placed on its function’s SEG 
node worklist. If a function’s entry set changes, it is placed on the “next” func- 
tion worklist. If the exit set of a function changes, all calling functions are placed 
on the “next” function worklist and the calling call node(s) are placed on their 
respective function’s SEG node worklist. The analysis runs until all worklists are 
empty. 
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Table 3. Flow- sensitive analysis run time in seconds. Numbers in parentheses are the 
speed-up from the previous version to the left. 



Benchmark 

Name 


|No Sharing! < — Sharing — » j 


Overall 

Speed-up 


Bmi^i 


« Worklists — t 


Unsorted 1 Sorted 


1 — No Forward Bind Filter — > 




allroots 
(227 LOC) 


1.41 


0.54 

(2.61) 


msEm 


0.24 

(0.88) 


0.10 

(2.40) 


14.10 


052.alvinn 
(272 LOC) 


1.03 


0.39 

(2.64) 


0.09 

(4.33) 


0.10 

(0.90) 


0.10 

(1.00) 


10.30 


Ol.qbsort 
(325 LOC) 


8.22 


4.16 

(1.98) 


1.81 

(2.30) 




0.75 

(1.59) 


10.96 


15. trie 
(358 LOC) 


4.59 


2.26 

(2.03) 


0.83 

(2.72) 




0.45 

(1.60) 


10.20 


04. bisect 
(463 LOC) 


1.47 


0.45 

(3.27) 


0.16 

(2.81) 


0.13 

(1.23) 


0.14 

(0.93) 


10.50 


17.bintr 
(495 LOC) 


2.20 


1.19 

(1.85) 


0.33 

(3.61) 


0.32 

(1.03) 


0.32 

(1.00) 


6.88 


anagram 
(650 LOC) 


6.67 


2.16 

(3.09) 


0.59 

(3.66) 


0.52 

(1.13) 


0.48 

(1.08) 


13.90 


lex315 
(733 LOC) 


5.03 


2.09 

(2.41) 


0.73 

(2.86) 


0.70 

(1.04) 


0.41 

(1.71) 


12.27 


ks 

(782 LOC) 


20.48 


9.40 

(2.18) 


3.32 

(2.83) 


2.55 

(1.30) 


1.49 

(1.71) 


13.74 


05.eks 
(1202 LOC) 


9.21 




0.92 

(3.12) 


0.87 

(1.06) 


0.67 

(1.30) 


13.75 


08. main 
(1206 LOC) 


96.37 


44.80 

(2.15) 


18.30 

(2.45) 


12.39 

(1.48) 




30.50 


OO.vor 
(1406 LOC) 


217.19 


113.71 

(1.91) 


38.70 

(2.94) 


32.96 

(1.17) 


11.92 

(2.77) 


18.22 


loader 
(1539 LOC) 


176.49 


58.37 

(3.02) 


26.71 

(2.19) 


21.20 

(1.26) 


IHEBw 


46.32 


129. compress 
(1934 LOC) 


6.00 


1.82 

(3.30) 




0.53 

(1.06) 


0.41 

(1.29) 


14.63 


ft 

(2156 LOC) 


80.94 


37.07 

(2.18) 


14.04 

(2.64) 


11.25 

(1.25) 


5.09 

(2.21) 


15.90 


football 
(2354 LOC) 


275.63 


61.30 

(4.50) 


26.87 

(2.28) 


22.75 

(1.18) 


2.70 

(8.43) 


102.09 


compiler 
(2360 LOC) 


39.31 


12.87 

(3.05) 


4.19 

(3.07) 


4.11 

(1.02) 


3.70 

(1.11) 


10.62 


assembler 
(3446 LOC) 


668.25 


240.46 

(2.78) 


119.31 

(2.02) 


97.26 

(1.23) 


9.69 

(10.04) 


68.96 


yacr2 

(3979 LOC) 


377.11 


86.50 

(4.36) 


26.76 

(3.23) 


25.93 

(1.03) 


10.64 

(2.44) 


35.44 


simulator 
(4639 LOC) 


511.49 


146.35 

(3.49) 


82.41 

(1.78) 


61.84 

(1.33) 


10.12 

(6.11) 


50.54 


099. go 

(29637 LOC) 




228.18 


74.01 

(3.08) 


65.23 

(1.13) 


9.83 

(6.64) 


23.21 


Averages 






2.84 




MlM 


25.38 
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Column 4 of Table 0 shows the improvement of the worklist-based implemen- 
tation over the iterating version, both of which use shared alias sets. The result 
was an average speed-up of 2.84 over the iterating version, which produced an 
average speed-up of 2.78 over the nonshared iterating version. 

4.3 Sorted Worklists 

Using an iterating analysis of a forward data-flow problem, it is natural to process 
the nodes in topological order. The next enhancement was to use priority queues 
(based on a topological ordei0) for the SEG and function worklists. 

Consider the case of a loop body that generates aliasing information. It would 
be best to process the loop body before moving on to the loop exit. Topological 
order alone does not give this property — we may process the loop exit before 
the loop body. (This would occur in Fig. [T] using the given node numbers as 
the topological order.) However, during the construction of the CFG, nodes for 
loop bodies are created before those nodes after the loop body. (Thus, node #7 
would be created before node #3.) Since nodes are assigned numbers as they 
are created (which is performed in a topological order, except in the presence of 
gotos), using the nodes creation number as a priority ensures that loop bodies 
are processed first. 

A result of using a single priority-based worklist of functions was that calling 
functions were visited before called functions. However, unlike SEG nodes, aliases 
can be propagated in both directions along a PCG edge. Thus, an optimal order 
of function visits is not apparent. In our benchmark suite, using a single priority- 
based worklist for functions provided only a marginal improvement over the 
iterating version. 

To increase efficiency, we use two function worklists — “current” and “next.” 
While visiting functions on the “current” worklist, we place functions on the 
“next” worklist. This has the effect that once a set of functions is on the worklist, 
the visiting order is fixed in a topological order. When the “current” worklist is 
empty, we swap the “current” and “next” worklists and continue the analysis. 

The fifth column of Table 0 reports the analysis time using sorted work- 
lists for both SEG nodes and functions and the previous enhancements. This 
enhancement resulted in an average speed-up of 1.16 over the previous version, 
which used nonpriority-based worklists and shared alias sets. 



4.4 ForwardBind Filtering 

ForwardBindO is a function in our analysis that propagates alias relations from 
call nodes to the entry set of called functions. If needed, it creates alias relations 
for the formats from the actuals in the called function’s entry set and then unions 
in the call node’s In set with the called function’s entry set. Gonsider those alias 
relations that cannot be changed or used in the called function (for example, the 

Topological order on a cyclic graph can be obtained by performing a depth-first 
search of the graph and then removing edges classihed as back edges m- 
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relation (*l,a) in Fig. Q, but are still propagated through the called function 
until they reach the exit set of the function. These relations are then propagated 
back to the call node’s Out set. In effect, the called function acts as an identity 
transfer function for those relations that are not relevant in the called function. 
Although correct, this is inefficient. 

Our enhancement is not to propagate from the call node those alias relations 
that cannot be reached in a called function. To compute the set of alias relations 
that are not reachable in the called function, F, we first call ForwardBindO , 
which propagates alias relations into the entry set of F, Entry p, as described 
above. We then view all alias relations in Entryp as a directed graph and remove 
from this graph all vertices (distinguished objects), and associated edges (alias 
relations), that cannot be reached from a global or a formal of F. These removed 
edges (alias relations) are simply unioned into the call node’s Out set directly. 
This can help limit the propagation effects of the unrealizable path problem. 

The last column of Table El reports the effectiveness of this optimization. It 
provided the most dramatic speed-up, an average of 3.09 over the previous im- 
plementation, which used all other enhancements. Some programs, in particular 
football and assembler, had a much higher than average speed-up resulting 
from this filtering. These two programs both shared some common character- 
istics: they have a single function that has both an unusually high amount of 
pointer affecting statements and a very high number of called functions. 

Figure 0 shows the effects of these optimizations in a dramatic way for the 
loader benchmark[3 We collected the data presented in this graph by repeat- 
edly executing the “ps v” command (under AIX 4.1.5) while each of the five 
FS analyses was running. We recorded the SIZE column, which gives the virtual 
size of the data section of the process; this will capture all heap allocated mem- 
ory usage during the analysis. The x-axis of the chart is simply marked off in 
samples (a sample is a single call to “ps v”). 

As the majority of our heap allocations are used to represent alias relations, 
the resulting memory usage can be interpreted as the number of alias relations 
stored by the analysis while running. The “No Sharing” version shows a char- 
acteristic curve; it grows quickly early on, but then levels off as the analysis 
reaches its fixed point. The difference between the No Sharing and Sharing ver- 
sions shows how the large reduction in the number of alias sets reduced the 
number of alias relations that were stored. 

The cumulative effect of all five optimizations was an average speed-up of 
25.38. This illustrates the benefits that can be obtained by limiting the prop- 
agation of extraneous alias relations and the number of visits to functions and 
nodes. 

5 Other Related Work 

This section describes other related work not mentioned in Section O 
The other benchmarks had similarly shaped graphs. 
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Fig. 8. Memory usage over time for the loader benchmark. 



Ruf |,Sf{j presents an empirical study of two algorithms: a flow-sensitive algo- 
rithm similar to the one we have implemented, and a context-sensitive version 
of the same algorithm. His results showed that the context-sensitive algorithm 
did not improve precision for pointers where they are dereferenced, but cau- 
tioned that this may be a characteristic of the benchmark suite analyzed. Wilson 
and Lam pilTTj present an algorithm for performing context-sensitive analysis 
that avoids redundant analyses of functions for similar calling contexts. Ghiya 
and Hendren HH present empirical data showing how points-to and connection 
analyses can improve traditional transformations, array dependence testing, and 
program understanding. 

Ruf jfl7j describes a program partitioning technique that is used for a flow- 
sensitive points-to analysis, achieving a storage savings of 1.3-7. 2 over existing 
methods. Diwan et al. m provide static and dynamic measurements of the 
effectiveness of three flow-insensitive analyses for a type-safe language (Modula- 
3). With the exception of AT, all three algorithms are less precise than the 
versions we have studied. Zhang et al. m report the effectiveness of applying 
different pointer aliasing algorithms to different parts of a program. 

Hasti and Horwitz m present a pessimistic algorithm that attempts to 
increase the precision of a flow-insensitive analysis by iterating over a flow- 
insensitive analysis and an SSA m construction. No empirical results are re- 
ported. Horwitz 1231 defines precise flow-insensitive alias analysis and proves 
that, even in the absence of dynamic memory allocation, computing it is NP- 
hard. 

6 Conclusions 

This work has described an empirical study of four pointer alias analysis algo- 
rithms that use varying degrees of flow-sensitivity. We have found that 
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— the address-taken analysis, although efficient, is unlikely to provide sufficient 
precision; 

— the fiow-insensitive analysis with kill is not beneficial; 

— the precision of the flow-insensitive analysis is identical to that of the flow- 
sensitive analysis in 12 of 21 programs in our benchmark suite; 

— most published implementations of flow-sensitive pointer analysis have equiv- 
alent precision. 

Although the flow-sensitive analysis efficiently analyzed a program on the 
order of 30K LOCs, further benchmarks are needed to see if this property gen- 
eralizes. We have also empirically demonstrated how various implementation 
strategies result in significant analysis-time speed-up. 
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Abstract. In this paper we present a dataflow analysis method for nor- 
mal logic programs interpreted with negation as failure or constructive 
negation. We apply our method to a well known analysis for logic pro- 
grams: the depth(fc) analysis for approximating the set of computed 
answers. The analysis is correct w.r.t. SLDNF resolution and optimal 
w.r.t. constructive negation. 

Keywords: Abstract interpretation, static analysis, logic programming, 
constructive negation. 



1 Introduction 

Important results have been achieved for static analysis using the theory of ab- 
stract interpretation p]. Abstract interpretation is a general theory for specifying 
and validating program analysis. 

A key point in abstract interpretation is the choice of a reference semantics 
from which one can abstract the properties of interest. While it is always possi- 
ble to use the operational semantics, it is possible to get rid of useless details, 
by choosing a more abstract semantics as reference semantics. In the case of 
definite logic programs, much work has been done in this sense. Choosing the 
most abstract logical least model semantics limits the analysis to type inference 
properties, that approximate the ground success set. Non-ground model seman- 
tics have thus been developed, under the name of S-semantics |2|, and proved 
useful for a wide variety of goal-independent analysis ranging from groundness, 
to sharing, call patterns, etc. All the intermediate fixpoint semantics between 
the most abstract logical one and the most concrete operational one, form in fact 
a hierarchy related by abstract interpretation, in which one can define a notion 
of the best reference semantics for a given analysis. 

On the other hand, less work has been done on the analysis of normal logic 
programs, although the finite failure principle, and hence SLDNF resolution, 
are standard practice. The most significant paper on the analysis of normal 
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logic programs, using the theory of abstract interpretation, is the one by Mar- 
riott and Spndergaard m, which proposes a framework based on Fitting’s least 
three-valued model semantics m- Since this reference semantics is a ground 
semantics, the main application of this framework is type analysis. Marriott and 
Spndergaard already pointed out that a choice of a different reference semantics 
could lead to an improved analysis. Fitting’s least three-valued model semantics 
is, in fact, an abstraction (a non recursively enumerable one, yet easier to define) 
of Kunen’s three- valued logical semantics m which is more faithful to SLDNF 
resolution US! and complete w.r.t. constructive negation. 

These are exactly the directions along which we try to improve the results 
of Marriott and Spndergaard. We consider the inference rule of constructive 
negation, which provides normal logic programs with a sound and complete m 
operational semantics w.r.t. Kunen’s logical semantics m We propose an anal- 
ysis method for normal logic programs interpreted with constructive negation, 
based on the generalized S-semantics given in |0| and on the hierarchy described 
in We present here an analysis based on the depth (fc) domain which approxi- 
mates the computed answers obtained by constructive negation and therefore the 
three-valued consequences of the program completion and GET (Clark’s equa- 
tional theory) . Other well known analyses for logic programs can be extended to 
normal logic programs. For example, starting from a suitable version of Clark’s 
semantics a groundness analysis was defined which is correct and optimal w.r.t. 
constructive negation. Here, for lack of space, we present only the depth(/c) anal- 
ysis. We show that it is correct and also optimal w.r.t. constructive negation. 
Finally we give an example which shows that in the case of type inference prop- 
erties our semantics yields a result which is more precise than the one obtained 
by Marriott and Spndergaard. 

From the technical point of view, the contribution of the paper is the def- 
inition of a normal form for first order constraints on the Herbrand Universe, 
which is suitable for analysis. In fact the normal form allows us to define an 
abstraction function which is a congruence w.r.t. the equivalence on constraints 
induced by the Clark’s equality theory. 

The paper is organized as follows. In Sect. 13 we introduce some prelimi- 
nary notions on constructive negation. In Sect. |3 we define a normal form on 
the concrete domain of constraints in order to deal, with equivalence classes of 
constraints w.r.t. the Clark’s equational theory. Section 0| defines the abstract 
domain and abstract operator and show its correctness and optimality (under 
suitable assumptions on the depth of the cut) w.r.t. the concrete one. Finally, 
Subsect. ESI shows an example. 



2 Preliminaries 

The reader is assumed to be familiar with the terminology of and the basic 
results in the semantics of logic programs mm and with the theory of abstract 
interpretation as presented in m 
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2.1 Normal Logic Programs and Constructive Negation 



We consider the equational version of normal logic programs, where a normal 
program is a finite set of clauses of the form A •<— c|Li,...,L„, where n > 0, 
A is an atom, called the head, c is a conjunction of equalities, and Li, ...,Ln 
are literals. The local variables of a program clause are the free variables in the 
clause which do not occur in the head. With Var{A) we intend the free variables 
in the atom A. 

In order to deal with constructive negation, we need to consider the domain 
C of full first-order equality constraints on the structure "H of the Herbrand do- 
main. Assuming an infinite number of function symbols in the alphabet, Clark’s 
equational theory (GET) provides a complete decidable theory for the constraint 
language HBCH, i.e. 

1. (soundness) "H |= CET, 

2. (completeness) for any constraint c, either CET \= 3c or CET ^ -i3c. 

A constraint is in prenex form if all its quantifiers are in the head. The set of free 
variables in a constraint c is denoted by Var(c). For a constraint c, we shall use 
the notation 3c (resp. Vc) to represent the constraint 3X c (resp. VA c) where 
X = Var{c). 

A constrained atom is a pair c|A where c is an H-solvable constraint such 
that Var{c) C Var{A). The set of constrained atoms is denoted by B. A con- 
strained interpretation is a subset of B. A three-valued or partial constrained 
interpretation is a pair of constrained interpretations < ,I~ >, representing 

the constrained atoms which are true and false respectively (note that because 
of our interest in abstract interpretations we do not impose any consistency 
condition between and I~). 

Constructive negation is a rule of inference introduced by Chan for normal 
logic programs in Pj, which provides normal logic programs with a sound and 
complete M operational semantics w.r.t. Kunen’s logical semantics M- In logic 
programming, Kunen’s semantics is simply the set of three- valued consequences 
of the program’s completion and the theory CET. 

The ^-semantics of definite logic programs |2j has been generalized to normal 
logic programs in P] for a version of constructive negation, called constructive 
negation by pruning. The idea of the fixpoint operator, which captures the set 
of computed answer constraints, is to consider a non-ground finitary (hence con- 
tinuous) version of Fitting’s operator. Here we give a definition of the operator 
Tp’^ which is parametric w.r.t. the domain Bx> of constrained atoms and the 
operations on constraints on the domain T>. 



Definition 1. Let P be a normal logic program. T^T is an operator overV{Bj))'x 
V{Bt>) defined by 
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: there exists a clause in P with local variables Y , 

C i ■■■5 

Cl I ; • ■ • ; \r^m € , 0^71+ 1 l^m+1 ; • ■ • ; \-^n ^ ^ 

such that c = 3Y {d AciA . . . Ac„)} 

: for each clause defining p in P with loc. var. Y^, 

Ck * — P{^) ^ dk\-^k,l^ ■■■5 j fik • 

there exist ek,i\Ak,i, ...,ek,mk\^k, 7 nk £ I~ , 

rik P rnk} Ck^mk+I \Ak^mk+iJ ^k^rik |^fc,nfc £ ^ 

where for rrik+i < J < nk, ~'Akj occurs in fik, 
such that c — f\^. VYfc (=; dfeV Ck,i ■ ■ -Y 
where the operations 3,V, =>, V,A, are the corresponding operations on the 
constraint domain ofP. 

In the case of a normal logic program, the operator Tp defines a generalized 
S-semantics which is fully abstract w.r.t. the computed answer constraints with 
constructive negation by pruning 0. By soundness it approximates also the set 
of computed answer constraints under the SLDNF resolution rule, or under the 
Prolog strategy. 

In have shown that this operator defines a hierarchy of reference 

semantics related by abstract interpretation, that extends the hierarchy defined 
by Giacobazzi for definite logic programs m Here we show the use of the 
hierarchy for the static analysis of normal logic programs. 



T^^{I)+ = {c\p{X)€Bj, 
T»-(/)- = {cb(x)eBi, 



3 Normal Forms in GET 

Unlike the semantics in Marriott and Spndergaard’s framework, our reference 
semantics is a non ground semantics and has to deal with first-order equality 
constraints. The first problem that arises is to define a normal form for such con- 
straints on the Herbrand domain, so that abstraction functions on constrained 
atoms can be defined. In general, in fact, given a theory th, we are interested in 
working with equivalence classes of constraints w.r.t. the equivalence of the con- 
straints in th. Namely c is equivalent to cMf t/i |= c -O- c' . Therefore we need the 
abstraction function on the concrete constraint domain to be a congruence. This 
is a necessary property since it permits to be independent from the syntactic 
form of the constraints. 

Dealing with normal logic programs, we need to achieve this property in GET. 
We thus need to introduce a normal form for first-order equality constraints, in 
a similar way to what has been done for the analysis of definite programs where 
the normal form is the unification solved form | I1 tij . Here we shall define a new 
notion of “false-simplified” normal forms, making use of Golmerauer’s solved 
forms for inequalities 0, Maher’s transformations for first-order constraints m 
and an extended disjunctive normal form m 

First let us motivate the need of a “false-simplified” form. Let us call a 
basic constraint an equality or an inequality between a variable and a term. The 
abstraction function will be defined inductively on the basic constraints, and it 
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will sometimes (e.g. for groundness analysis) abstract to true some inequalities. 
Consider, for example, the following constraint d = \/X{Y = b A X ^ /(a))- 
d is "H-equivalent to false. If the abstraction of X ^ f{a) is true then the 
abstraction of d will be the abstraction of F = b, which cannot be "H-equivalent 
to the abstraction of false. Therefore we need to define a normal form where 
the constraints which are ?^-equivalent to false, are eliminated. 

Definition 2. Consider a constraint d in prenex disjunctive form, d = Aiy/iAf), 
where A is a sequence of quantified variables and \/iAi is a finite disjunction, 
d is in a f alse-simplified form if, either there does not exist a proper subset I 
of the i's such that H ^ A{\/iAi) o A{\/i^iAi), or such an I exists and there 
exists also a subset K of I, such that Vj^jAj is H-equivalent to VkeKAk. 

The latter condition assures that we really eliminate constraints that are "H- 
equivalent to false and that are not just redundant in the constraint. Now the 
existence of a false-simplified form for any constraint can be proved simply with 
the following: 

Algorithm 3 Input: a constraint in prenex disjunctive form d = A(\/iAi). 

Let us call U the set of the indices i’s in d = AfyiAf). 

1. Let I and J be the partition of U such that i G I ifH\= 3A{Ai), otherwise 
i G J. 

2. Repeat I := lUS as long as there exists an S C J such that H \= 3Z\(VigsAi) 
and for all j G S H ^ 3Z\(Vig(s\{j})Ai). 

3. Let S C J\I be any minimal set such that 

a ^ 3Z\(Vsg5Ag Vig 7 Af) and R |= Z\(Vsg 5 As V^g/ Af) d, do I := I U S , 

j. Output: A{y i^i Ai) . 

The idea of the algorithm is to find a subset of the conjunctions Ai ’s (those with 
i G I) such that A(y i^iAi) is in false- simplified form and it is "H-equivalent to 
A{\/iAi). In the first step we select the Afs such that A{Ai) is H-satisfiable. In 
this case, in fact, Ai cannot be H-equivalent to false and it can be put in the set 
I. In the second step from the remaining A^’s we select the set of A^’s such that 
their A quantified disjunction is H-satisfiable, since we check that all the A^’s 
are necessary for the quantified disjunction to be H-satisfiable, the considered 
Ai’s can not be H-equivalent to false. At the end of this process, if the resulting 
constraint is H-equivalent to the input constraint, we stop. Otherwise, we add a 
minimum number of the not yet selected A^’s such the Z\(ViAi) for the selected 
i’s is H-equivalent to the input constraint. Since we add a minimum number 
of not yet selected A^’s, we are sure that the resulting constraint is in false- 
simplified form. Example 0 shows how the algorithm 0 works on two examples. 

Example j. 1. Input: ci = VT(Ai V A 2 V A 3 V A 4 ), Ai = (T = /(H) AY = a), 
A 2 = (T^ f(a)AY = b), As = (Y ^ g(H,T)), A 4 = (T^aAY = a). 
h = {3}. 

/2 = {3,1,2}. 

Is = {3,1,2}( since H 1= yT{\/i(zj^Ai) o ci). 

Output: VT(Ai V A 2 V A 3 ). 
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2. Input: C 2 = VT(A; V A'^ V A'^), A[ = (T ^ f{U) AT ^ f{V)), 

A' ^ (T = H),A', = (U^VAT = /(a)). 

h = {}■ 

/2 = {1,2}. 

h = {1,2,3}( since H ^ yT{y,^i^A\) o c^)} 

Output: VT(A'^ V ^2 V A'^). 

Theorem 5. For any input constraint c = A{\/iAi), algorithm\^ terminates 
and computes a false-simplified form logically equivalent to c. 

Note that all the false-simplified forms of a constraint c are 7^-equi valent. 

Now the intuitive idea for a normal form is the following. We put a constraint 
in prenex form and we compute the disjunctive form of its quantifier free part. 
We make equality and inequality constraints interact in every conjunction of the 
disjunctive form and then we compute the false-simplified form for the resulting 
constraint. The problem is that if we consider a standard disjunctive normal 
form, we would not be able to see explicitly all the relations between constraints 
in disjunctions. Consider, for example, the constraint {X = f{H) V {H yf /(a)). 
This constraint is equivalent, therefore 'H-equi valent, to the constraint ({X = 
f{f{a))AH — f{a))\/H /(a)). Note that the equality H = f{a) is not explicit 

in the first disjunction. Since the abstraction function will act on the terms of the 
disjunction independently, this could cause a problem. This is why we will use a 
well known extended disjunctive form defined for Boolean algebra and applied, 
in our case, to the algebra of quantifier free constraints. 

In the next theorem with Bi we denote basic equality or inequality constraints 
{X = t 0 T X t). For any B, let = ~^B, and = Bi. 

Theorem 6. m For every Boolean formula j) on basic equality or inequality 
constraints Bi, . . . , Bn, 

where = (V(ai,... ,an)e{false,true}^ . . . , o„) A . . . A ) . 

Note that ■0 is a formula in disjunctive form. ip has in fact a particular disjunctive 
form where each conjunction contains all the basic constraints (possibly negated) 
of 4>. This is why, this form is able to capture all the possible relations between 
the different terms of a disjunction. 

We will call the formula ip the extended disjunctive normal form (dnf) of (j). 
The next example shows how the extended disjunctive normal form works on a 
constraint ci. 

Example 7. c\ = {X = f{F[) V 77 yf /(a)). 
dnf{c,)= ((X = /(/(«)) A 77 = /(a))V 

(X = /(77) A 77 y7 /(a)) V (X y7 /(77) A 77 /(a))). 

Note that although ci, dnf{ci) and ((AT = /(/(a)) A 77 = /(a)) V 77 yf /(a)) 
are 77-equivalent, dnf{ci) is the most “complete” in the sense that it shows 
syntactically all the relations between constraints in disjunctions. 

^ U = b, H = f{b), F = a, in fact, is an assignment (for the free variables of C 2 ), which 
is a solution of C 2 but is not a solution of 
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We can now define the normal form, Res{c), of a first-order equality con- 
straint c, as the result of the following steps: 

1. Put the constraint c in prenex form obtaining Z\(ci), where Z\ is a sequence 
of quantified variables and ci is the quantifier free part of c. 

2. Compute dnf{ci) = V(^i), 

3. Simplify each conjunction Ai obtaining A' = ResConj(Ai), 

4. Return a /a^se-simplified form of the constraint A{yA'j). 

where the procedure for simplifying each conjunction is based on Maher’s canon- 
ical form HHj and Colmerauer’s simplification algorithm for inequalities . The 
procedure performs the following steps, 

ResConj{A) 

1. compute a unification solved form for the equalities in the conjunction A 

2. for each equality X = t in A, substitute X by t at each occurrence of X in 
the inequalities of conjunction A. 

3. simplify the inequalities by applying the following rules, 

a) replace f (ti , . . . , fy y*(si , . . . , Sn) by fy fy si V . . . V fy Sn. 

b) replace /(ti, ... , fy) fy . . . , s„) by true. 

c) replace t fy a; by x fy t if t is not a variable, 
obtaining A', 

4. if A' is a conjunction then return A'. 

5. otherwise compute dnf{A) = V(Ai) and return yResConj{Ai). 

It is worth noting that the previous algorithm terminates since each constraint 
contains a finite number of inequalities. 

Example 1^ shows how the procedure Res computes the normal form of some 
constraints. 

Example 8. 1. c= (X = f(Y) A (E = a V F = f(a)) A VU X fy f(f(U))). 

Cl =yU{X = f{Y) A (F = a V F = /(a)) A X fy /(/([/))). 

C 2 = V[/( (X = /(F) A F fy a A F = /(a) A X fy /(/(C/)))V 

(X = /(F) A F = a A F fy /(a) A X fy /(/(C/)))V 

(F = /(F) A F = a A F = /(a) A X fy /(/(C/)))). 

C 3.1 = Vt/( (F = /(/(a)) A F fy a A F = /(a) A F fy /(/(C/)))V 

(F = /(a) A F = a A F fy /(a) A X fy /(/([/))). 

C 3.2 = Vt/( (F = /(/(a)) A /(a) fy a A F = /(«) A /(/(a)) fy /(/(C/)))V 

(F = /(a) A F = a A a fy /(a) A /(a) fy /(/(C/))). 

C3.3 = V[/((F = /(/(a))AF = /(a)AafyC/)V 

(F = /(a) AF = aAafy /([/))). 

C 4 = (X = /(a) A F = a). 
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2. c = (X = f{Z, S)AU= {f{H),H) A 5 = a A X ^ 

C 2 = Cl — C. 

C3.1 = {X = f{Z,a) AU = f{f{H),H) A S = aAX ^ U). 

C3.2 = (X = f{Z, a)AU = H)AS = aA f{Z, a) + H)). 

C3.3 = {X = f{Z,a) AU = f{f{H),H) A S = aA{Z ^ f{H)V H a). 

C3.4 = Ai V A2 V A3. 

Ai = {X = f{Z,a) AU = H) A S = a A Z = f{H) A H ^ a). 

A2 = {X = f{Z,a)AU = f{f{H),H)AS = aAZ^f{H)AH^a). 

A3 = {X = f{Z,a) AU = f{f{H),H) A S = a A Z ^ f{H) A H = a). 

ResConj(Ai) — Ai ResConj{A 2 ) — A 2 ResConjlA^) = A 3 . 

Ai = {X = f{f{H),a)AU = fif{H),H)AS = aAZ = f{H)AH^a). 
A3 = {X = f{Z,a) AU = f{f{a),a) A S = a A Z ^ f{a) A H = a). 

C4 = Ai V A2 V A3. 

Note that all the steps in ResConj and Res preserve the ?^-equi valence, the 
third step of ResConj is Colmerauer’s simplification algorithm for inequalities 
0 , the first and second transformations are the usual ones for GET formulas , 
while the second step of Res is the extended disjunctive normal transformation 
m- Hence we get: 

Proposition 9 . TL \= (p -(rA Res{(f>). 

Our concrete constraints domain AfC will be the subset of constraints in C which 
are in normal form. The concrete operations on AfC will be thus defined using 
the normal form: 

Definition 10. Let ci,C2 G AfC, 

Cl A° C 2 = Res{ci A C 2 ) ci V“ C 2 = Res{ci V C 2 ) 

-■°ci = Res{->ci) 3°X Cl = 3X ci V°X ci = VX ci 

We denote by B the set of constrained atoms with constraints in AfC, and by 
{I, C) the complete lattice of (not necessarily consistent) partial constrained 
interpretations formed over B. 



4 Depth(fc) Analysis for Constructive Negation 

The idea of depth(fc) analysis was first introduced in ^Dj. The domain of depth(fc) 
analysis was then used in order to approximate the ground success and failure 
sets for normal programs in PI. 

We follow the formalization of 0 for positive logic programs. We want to 
approximate an infinite set of computed answer constraints by means of a con- 
straint depth(fc) cut, i.e. constraints where the equalities and inequalities are 
between variables and terms which have a depth not greater than k. 
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Our concrete domain is the complete lattice of partial constrained inter- 
pretations (I, C) of the previous section. Since our aim is to approximate the 
computed answer constraints, the fixpoint semantics we choose in the hierarchy 
m is the one which generalizes the 5'-semantics to normal logic programs, the 
Tp® operator (cf def. 0. The version we consider here is the one defined on 
the domain B with the concrete operations in AfC, A^, V^, (the Tp 

operator) . 



4.1 The Abstract Domain 



Terms are cut by replacing each-subterm rooted at depth greater than fc by a 
new fresh variable taken from a set W, (disjoint from V). The depth(fc) terms 
represent each term obtained by instantiating the variables of W with terms 
built over V. 

Consider the depth function || : Term — >■ Term such that 




1 if t is a constant or a variable 

maa:{|ti|, . . . , |t„|} -|- 1 if t = f{ti, ... ,t„) 



and a given positive integer k. The abstract term ak{t) is the term obtained 
from the concrete one by substituting a fresh variable (belonging to W) to each 
subterm t' in t, such that |f| — \t'\ = k. 

Consider now the abstract basic constraints 



ABC = 



c = {X = t) \t\ < k or 

c= (X'^ t') \t'\ < k, and Var{t') C^W = % 



Note that Var{t') fl VF = 0 expresses the fact that inequalities between variables 
and cut terms are not allowed. The domain of abstract constraints is defined as 
follows, 

Definition 11. 



AMC f c \ c is a constraint in normal form built with 

( the logical connectives V, A, V and 3 on ABC 

The concepts of abstract constrained atoms and partial abstract interpretations 
are defined as expected. 

Definition 12. An abstract constrained atom is a pair c\A such that c £ AAfC 
and c is a % — solvable constraint, A is an atom and V ar{c) C Var{A). With 
we intend the set of abstract constrained atoms. 

The abstract domain is the set of partial interpretations on abstract constrained 
atoms. A partial abstract constrained interpretation for a program, is a pair of 
set of abstract constrained atoms, /“ =< /“ , >, not necessary consistent. 

We consider 1“ = {J“| is a partial interpretation}. 

With respect to the case of definite logic programs 0, we need to define a 
different order on the abstract constraint domain. 
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This is because the result c“ of an abstract and operation on the abstract con- 
straint domain will be an approximation of the result c of the concrete and 
operation on the concrete constraint domain, in the sense that c“ will be “more 
general” than the abstraction of c (where here “more general” means “is implied 
under "H”) . 

This motivates the definition of the following relation on the abstract constraint 
domain. 

Definition 13. Let c, c' € AJ\fC. c <a c' ifH \= c' . 

We consider the order <a induced by the preorder namely the order obtained 

considering the classes modulo the equivalence induced by 
We define the downward closure of a pair of sets w.r.t. the <a order, 

Definition 14. Consider a pair of sets of constrained atoms B. 

By I B we denote the downward closure of < B^,B~ >. 
c\A if there exists c'jA G and c <a C , 

c\A B~ if there exists c'jA G B~ and c <a c' . 

Intuitively, a set of constrained atoms / is less or equal than J, if / Qf J. 
Definition 15. Consider I,J G 

ja ^ ja ^ /or all c\A G 3c' I A G such that c <a c' and 

for all c\A G /“ 3c'|A G J“ such that c <a c' 

It is immediate to see that -< defines a preorder. We consider the order < induced 
by the preorder namely the order obtained by considering the classes 1“ 
modulo the equivalence induced by Then our abstract domain is (I“,<). 
Since the operations on the equivalence classes are independent on the choice of 
the representative, we denote the class of an interpretation by /“ itself. In 
the rest of the paper, we often abuse notation by denoting by /“ the equivalence 
class of /“ or the interpretation /“ itself. 

4.2 The Abstraction Function 

Let us now define the abstraction function. To this aim we first define the func- 
tion Oc on constraints. The main idea is to define Oc on the basic constraints as 
follows: an equality X = t\s abstracted to A = ak{t), while an inequality X ^ t 
is abstracted to X ^ t A \t\ < k and to true otherwise. 

We denote by A(c) the constraint c' in normal form and by A the sequence of 
quantified variables of c', where c is the quantifier-free part of c'. 

Definition 16. The depth(k) constraint abstraction function is the function Oc : 
AfC ANC: 

ac{A{c)) = A,A'ac{c) where A' = 3Yi,3F25--, andYi G {W r\Var{ac{c))) 
ac{X = t) = {X = ak{t)), 

adfalse) = false, adtrue) = true, 

ac(X d t) — {Ai d t) if |f I ^ ctcix d t) — true if |f| > k, 

adA A B) = adA) A adB), ctdA V B) = adA) V adB). 
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Note that the first definition means that all the new variables introduced by the 
cut terms have to be considered existentially quantified. 

Example 113 shows an application of Uc- 

Example 17. c = ^U{{H = f{f{T)) AT ^ f{f{U)) AX = /(C/))V 

{H = /(/(T)) AT^ f(X) AX^ /([/))), k = 2. 

a,(c) = Oe( m {H = /(/(T)) A T ^ /(/([/)) A X = /(C/))V 

{H = f{f{T)) A T ^ /(X) A X ^ /([/)))) = 

Vt/( = /(/(T))) A a,(T ^ /(/([/))) A a,(X = /(C/)))V 

(a,(i? = /(/(T))) A a,(T /(X)) A a,(X ^ f{U))) = 
VC/, 3Qi,Q 2 {{H = f{Qi) A true A X = /(C/))V 

(il = /(Q 2 ) A T /(X) A X y/ /(C/))) (Qi, Q 2 e W"). 

The abstraction function a is defined by applying to every constraint of the 
constrained atoms in the concrete interpretation. 

Definition 18. Let a : I — >■ !“.• a =< a~^, a~ > 
a+(/) = {c\A I c'jA € /'*" and adc') = c}, 

a~{I) = {c\A I d\A S /“ and adc') = c}. 

As a consequence the function 7 on (equivalence classes of) sets of abstract 
constraints is automatically determined as follows: 

Definition 19. Let 7 : 1“ — X: 

7(/“) = u{/ I a{L) < /“} = 

U{/ I Vc|A G a'^{I) 3c'|A G such that c <a c' and 

Vc|A G a~{L) 3c'\A G /“ such that c <a c'} = 

u{/ I ; a{L) c; /“} = u{/ |a(j) c; /“} 

Lemma 20. a is additive. 



Theorem 21. < 0,7 > is a Galois insertion o/(I, C) into (I“,<). 



4.3 Oic is a Congruence w.r.t. the H-Equivalence 

As we have already pointed out in Sect.0 we want to work with "H-equi valence 
classes of constraints and, for this purpose, we need to be sure that the above 
defined function Oc on A/C is a congruence w.r.t. the "H-equi valence. This means 
that if two constraints c,E G A/C are ?t-equi valent, then also ac(c) and adc') 
have to be "H-equi valent. 

In order to understand whether two constraints are "H-equi valent, it is useful 
to state the following result. 

Lemma 22. Consider the inequality X y/ i. There exist no arbitrary quantified 
t\, . . . fin, where ti t, such that X t is H-equivalent to A^X ti. 
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This is a consequence of the fact that we consider the models of the theory GET 
without the DCA axiom. 

The previous result, together with the fact that constraints are in false- 
simplified form, allows us to claim that Oc is a congruence. 

Theorem 23. Let c, c' G NC. IfH \= c' then H \= adc) O ac(c'). 

4.4 The Abstract Fixpoint Operator 

We now define the abstract operations that will replace the concrete ones in the 
definition of the fixpoint abstract operator. We show that the abstract operations 
are a correct approximation of the concrete operations. 

The definition of the abstract and operation is not immediate. The example M 
is meant to give some intuition on some problems that may arise. 

Example 24- Consider the following two constraints: 

Cl = (A = /(Z, f{H)) AS = /(a)) C 2 = {U d X AY d f{S)) and k = 2. 
Consider Q!c(ci) = ^Q{X = f{Z,Q) AS = /(a)) ac(c 2 ) = {U d X AY d 
If we now consider the normalized form of ac(ci) Aac(c 2 ) the resulting constraint 
is 3Q{U d Q) AY d /(/(«)) A X = f{Z, Q) A S = /(a)), which is not an 
abstract constraint according to definition II II 

The problem is that the normalized form of the logical and operation on two 
abstract constraints is not in general an abstract constraint (the depth of the 
terms involved in equalities and inequalities can be greater than k and it can 
contain inequalities between variables and cut terms). 

This is the reason why we need to define a new M operator, on the normalized 
forms of abstract constraints. The M operator must cut terms deeper than k 
and replace by true all the inequalities which contain a cut term. Intuitively this 
is because X d where Var{t)(lW d represents, on the concrete domain, an 
inequality between a variable and a term longer than k. On the abstract domain, 
such inequalities are abstracted to the constant true. 

Definition 25. Let A4 : AfC -A AAfC 

M{A{c)) = A,A'M{c) where A' = 3Y\,3Y2, where Yi G {Wr\Var{M{c))). 
M(X = t) = {X = adt)) 

M{X dt) = {X d t) */ 1^1 ^ ^ Var{t) n W = 0 

m\x d t) = (true) if |t| > k or V ar{t) 0 W 0 

MfA A B) = ac{A) A a^B), A4{A V B) = adA) V adB) 

As expected, the M operator is similar to the Oc operator. The only difference 
is that A4 replace by true all the inequalities between variables and cut terms. 
Since AAfC is a subset of AfC, the Res form is defined also on the abstract 
constraints domain. 

Definition 26. Let ci,C 2 G AAfC 

C\Ac 2 = Ai{Res{c\ A C 2 )), C 1 VC 2 = AA{Res{ci V C 2 )), 

~<ci = AA{Res{~<ci)) , 3X ci = 3X ci, VA ci = VA ci. 
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It is worth noting that the procedure Res on the abstract domain needs to 
perform the logical and on abstract constraints. This means that most of the 
observations that can be done on the behavior of the abstract and operation, 
concern also the abstract or and not operations. 

Example 03 illustrates the relation between the abstract and operation and 
the abstraction of the concrete and operation. For a sake of simplicity, since 
in this case it does not affect the result, we write the constraint ci in the more 
compact standard disjunctive form rather than of in extended disjunctive form. 

Example 21. c\ = 'iKiiY = aAU ^ f(f(K))) \/ Z = a), C 2 = (U = f(f(a))). 
Consider fc = 1. ac(ci) = {Y = a\/ Z = a), Oc(c 2 ) = 3V U = f{V). 
ae(ci)Aae(c2) = 3V{{Y = aAU= f{V)) V{Z = aAU= f {¥))). 
ac{Res{ci A C 2 )) = 3V {Z = a A U = f(V)). 

H 1= ac{Res{ci A C 2 )) -A ac(ci)Aac(c2) 

As already pointed out, the abstract and gives a more general constraint than 
the abstraction of the one computed by the concrete and and this is the reason 
why we have defined an approximation order based on implication (under R) 
between constraints. 

In order to show that the abstract operations are correct, we prove a stronger 
property. 

Theorem 28. Let ci,C 2 € MC. 

ac{ci)Aac{c2) >a ac(ciA°C2), ac(ci)Vac(c2) >a ac(ciV‘=C2), 

3x Oc(ci) = ac{3'^x Cl), \/x ac(ci) = a^iV^x Ci). 

As shown by example EHI the correctness property does not hold for the version 
of abstract “not” which we have defined, if we consider general constraints. 

Example 29. Consider ci = (A yt /(/(a))) and k = 1. 

= 3Y X = f{Y) which does not implies ^(ac(ci)) = false. 

Since the not operator is used by the abstract fixpoint operator on “simpler “ 
constraints (the program constraints) only, we are interested in its behavior on 
conjunctions of equalities between variables and terms only. For this kind of 
constraints the following result holds. 

Lemma 30. If a = (Ai(^i = tt)) G -^C, then Aadci) >a ac(-'°(ci)). 

Now that we have defined the abstract constraints domain and the abstract 
operations, we can define the abstract fixpoint operator. 

Definition 31. Let a{P) he the program obtained hy replacing every constraint 
c in a clause of P by Oc(c). 

The abstract fixpoint operator: 1“ — >■ 1“ is defined as follows, Tp (/“) = 

/“), where the operations are 3, V, A on AMC and V, A on AMC x AMC. 

By definition, Tp is a congruence respect to the equivalence classes of the 
abstract domain. Note also that Tp is monotonic on the (I“, <), because I < J 
implies f I Cf J. 
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Lemma 32. Tp” is monotonic on the 

The proof that the abstract operator is correct w.r.t. the concrete one, is based 
on the correctness of the abstract operations on the abstract constraints domain. 



Theorem 33. a(r|(7(J“))) < Then a{lfp{T^)) < 

Consider now a k greater than the maximal depth of the terms involved in the 
constraints of the clauses in the program P. In this case the abstract operator 
is also optimal. 

Theorem 34. Tf{I^) < 

Let us finally discuss termination properties of the dataflow analyses presented in 
this section. First note that the set of not equivalent (w.r.t. H) set of constraints 
belonging to AAfC is finite. 

Lemma 35. Assume that the signature of the program has a finite number of 
function and predicate symbols. Our depth(k) abstraction is ascending chain fi- 
nite. 



4.5 An Example 

We now show how the depth-fc analysis works on an example. The program of 
figure Q] computes the union of two sets represented as lists. We denote the 
equivalence class of Tp by Tp itself. All the computed constraints for the 
predicate -^member are shown, while concerning the predicate ~<union, for a 
sake of simplicity, we choose to show just a small subset of the computed answer 
constraints (written in the more compact standard disjunctive form). Therefore, 
the concretization of the set of answer constraints for -lunion that we present in 
figure 0 contains some answer constraints computed by the concrete semantics 
but not all of them. 

As expected the set of answer constraints, computed by the abstract flxpoint 
operator, is an approximation of the answer constraints, computed by the con- 
crete operator, for the predicates member, union and -imember. For example, for 
the predicate ^member{X,Y), we compute the answer \/L{Y ^ [X,L]) which 
correctly approximates the concrete answer \/L,H,Hi,Li{Y [A, L] AY ^ 
[H, Hi,Li\). While the constraint answer 3AViLi, Li3Zi, Z 2 (A = [X, Z\] AC = 
[X, Z 2 ]AB yf [Hi,Li]) for union{A, B, C), approximates the concrete constraint 
A = [A, A], C = [A, A, A], B = K and B is not a list, computed by the con- 
crete semantics. Note, in fact, that, if the second argument is not a list, the 
predicate member finitely fails. Let us now consider Marriott and Spndergaard’s 
abstraction for the program P, with a language where the only constant is a (this 
assumption does not affect the result). Concerning the predicate union with the 
empty list as first argument, their abstraction computes the following atoms 
union{[ ],o,a), union{[ ], [ ],[ ]), union{[ ],[a],[a]), union{[ ],[a, Zi],[a, Z 2 ]), 
while we obtain the more precise answer (A = [ ] A B = C)\union{A, B , C) . 
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P : 

union(A,B,C) :-A = [],B = C. 

union(A,B,C) : -A = [X,L],C = [X, K], ^member {X, B),unicm{L, B, K). 
union{A^B,C) : —A = [X, L], member {X^ B),union{L, B, C). 
member {X, Y) : -Y = [X, L], 
member{X,Y) : —Y = [H, L], member {X , L). 

Consider now a depth-2 analysis with Zi G W. 

'7^S“ 

tp 

3L( Y=[X,L] )|member(X.y). 

3H, Zi ( Y = [H, Zi] )\member{X, Y) . 

A = [\hB = C \unim(A,B,C). 

3X,L( A=\X]hB = [X,L]^B = C )\unimi(A,B,C). 

3X,Y,Zi( A=[X\hB = [Y,Zl\^B = C )\union(A,B,C). 

3X,H,L,Zi( A = [X,Z,\hB = [X,L]hB = C )\union(A,B,C). 

3X,H,Zi,Z2( A = [X,Z 2 \hB = \H,Z 2 \hB = C )\union{A, B,C). 

3X, K\/H, L{ A = [X] A C = [X, K] A B ^ [H, L] A B = K )\union{A, B, C). 
3X'iH,L3Zi,Z2{ A = [X,Zi]AC = [X,Z 2 ]AB )\union{A,B,C). 

3X, KVL{ A=\X]AC = [X,K]AB^[X,L\AB = K )\union{A, B, C). 
3X^L3 Zi,Z2( A = [X,Zi]AC = [X,Z 2 ]AB ^\X,L] )\union{A,B,C). 

A subset of Tp (complete for the predicate member) 

^H,L{ Y^[H,L] )|raem6er(X,Y). 

VL( Yjb[X,L] )\member{X,Y). 

'iX,L,KXi,Li ((Ayf [] A Ayf [X,L])V 

[B^CAA^ [X,L]V 

(B yf C A C yf [X. A]) A A ^ [Xi, Li]) )|umon(A. B, C). 

~iX,L,KXi,Li,H,L2 ((Ayf [] A Ayf [X.L])V 

(ByfCAAyf [X,L]V 
(B yf C A C ^ [X. X]) A A yf [Xi. Li])v 
(B yf C A C yf [X. B] A B yf [B-, L2])V 
(A yf [ ] A C yf [X, L] A B yf [B. La]) |)urao™(A, B, C) 



Fig. 1. Example 1 
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The atom union{[ ],[a, Zi],[a, Z 2 ]), in fact, correctly approximates the predi- 
cates deeper than k which have a successful behavior, but it has lost the relation 
between B and C . As a consequence all the other ground atoms for union com- 
puted using the atom union{[ ], [a, Z\\, [a, ^' 2 ]), are less precise than the ground 
instances of the atoms computed by our non-ground abstract semantics. 

5 Conclusion 

Starting from the hierarchy of semantics defined in cni, our aim was to show 
that well known analysis for logic programs could be extended to normal logic 
programs. Based on the framework of abstract interpretation |3Hj, we have pre- 
sented a depth(fc) analysis which is able to approximate the answer set of normal 
logic programs. 

It is worth noting that our depth(fc) analysis, can be easily generalized to con- 
straint logic programs defined on T-L, whose program constraints can be conjunc- 
tions of equalities and inequalities. In order to deal with constructive negation, 
in fact, most of the results presented in this paper hold for first order equality 
constraints. The only exception is lemma EDI (and consequently theorem E3 and 
theorem OH, which is true only for conjunctions of equalities. But a more com- 
plex definition of the abstract not operator can be defined and proven correct on 
conjunctions of equalities and inequalities constraints. This alternative definition 
is, however, less precise than the one defined here. As a consequence theorem 
03 where the abstract fixpoint operator uses the new abstract not operator, still 
holds for such “extended” logic programs, while it is not the case for theorem 
EH 
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Abstract. It is important that practical data flow analysers are backed 
by reliably proven theoretical results. Abstract interpretation provides a 
sound mathematical framework and necessary generic properties for an 
abstract domain to be well-defined and sound with respect to the con- 
crete semantics. In logic programming, the abstract domain Sharing is a 
standard choice for sharing analysis for both practical work and further 
theoretical study. In spite of this, we found that there were no satisfactory 
proofs for the key properties of commutativity and idempotence that are 
essential for Sharing to be well-defined and that published statements of 
the safeness property assumed the occur-check. This paper provides a 
generalisation of the abstraction function for Sharing that can be applied 
to any language, with or without the occur-check. The results for safe- 
ness, idempotence and commutativity for abstract unification using this 
abstraction function are given. 

Keywords: abstract interpretation, logic programming, occur-check, ra- 
tional trees, set-sharing. 



1 Introduction 

Today, talking about sharing analysis for logic programs is almost the same 
as talking about the set-sharing domain Sharing of Jacobs and Langen [8,9]. 
Researchers are primarily concerned with extending the domain with linearity, 
freeness, depth-A: abstract substitutions and so on [2,4, 12, 13, 16]. Key properties 
such as commutativity and soundness of this domain and its associated abstract 
operations are normally assumed to hold. The main reason for this is that [9] 
not only includes a proof of the soundness but also refers the reader to the thesis 
of Langen [14] for proofs of commutativity and idempotence. 

In abstract interpretation, the concrete semantics of a program is approxi- 
mated by an abstract semantics. In particular, the concrete domain is replaced 
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by an abstract domain and each elementary operation on the concrete domain is 
replaced by a corresponding abstract operation on the abstract domain. Thus, 
assuming the global abstract procedure mimics the concrete execution proce- 
dure, each operation on elements in the abstract domain must produce an ap- 
proximation of the corresponding operation on corresponding elements in the 
concrete domain. The key operation in a logic programming derivation is unifi- 
cation (unify) and the corresponding operation for an abstract domain is aunify. 

An important step in standard unification algorithms is the occur-check that 
avoids the generation of infinite data structures. However, in computational 
terms, it is expensive and it is well known that Prolog implementations by de- 
fault omit this check. Although standard unification algorithms that include the 
occur-check produce a substitution that is idempotent, the resulting substitution 
when the occur-check is omitted, may not be idempotent. In spite of this, most 
theoretical work on data-flow analysis of logic programming assume the result 
of unify is always idempotent. In particular both [9] and [14] assume in their 
proofs of soundness that the concrete substitutions are idempotent. Thus their 
results do not apply to the analysis of all Prolog programs. 

If two terms in the concrete domain are unifiable, then unify computes the 
most general unifier (mgu). Up to renaming of variables, an mgu is unique. More- 
over a substitution is defined as a set of bindings or equations between variables 
and other terms. Thus, for the concrete domain, the order and multiplicity of 
elements are irrelevant in both the computation and semantics of unify. It is 
therefore useful that the abstraction of the unification procedure should be un- 
affected by the order and multiplicity in which it abstracts the bindings that 
are present in the substitution. Furthermore, from a practical perspective, it is 
useful if the global abstract procedure can proceed in a different order to the 
concrete one without affecting the accuracy of the analysis results. Hence, it is 
extremely desirable that aunify is also commutative and idempotent. However, 
as discussed later in this paper, only a weak form of idempotence has ever been 
proved while the only previous proof of commutativity [14] is seriously flawed. 

As sharing is normally combined with linearity and freeness domains that 
are not idempotent or commutative, [2, 12] it may be asked why these properties 
are important for sharing. In answer to this, we observe that the order and 
multiplicity in which the bindings in a substitution are analysed affects the 
accuracy of the linearity and freeness domains. It is therefore a real advantage 
to be able to ignore these aspects as far as the sharing domain is concerned. 

This paper provides a generalisation of the abstraction function for Sharing 
that can be applied to any language, with or without the occur-check. The results 
for safeness, idempotence and commutativity for abstract unification using this 
abstraction function are given. Detailed proofs of the results stated in this paper 
are available in [7]. 

In the next section, the notation and definitions needed for equality and 
substitutions in the concrete domain are given. In Section 3, we introduce a 
new concept called variable-idempotence that generalises idempotence to allow 
for rational trees. In Section 4, we recall the definition of Sharing and define its 
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abstraction function, generalised to allow for non-idempotent substitutions. We 
conclude in Section 5. 

2 Equations and Substitutions 

2.1 Notation 

For a set S', # S' is the cardinality of S, p(S) is the powerset of S, whereas pf (S) 
is the set of all the finite subsets of S. The symbol Vars denotes a denumerable 
set of variables, whereas Tvars denotes the set of first-order terms over Vars for 
some given set of function symbols. The set of variables occurring in a syntactic 
object o is denoted by vars(o). 

2.2 Substitutions 

If a; G Vars and s G Tvars, then a; i— >■ s is called a binding. A substitution is a 
total function a : Vars — >■ Tvars that is the identity almost everywhere; in other 
words, the domain of ct, 

dom(cr) { X G Vars | cr(x) x j 

is finite. If t G Tvars, we write ta to denote cr{t). 

Substitutions are denoted by the set of their bindings, thus cr is identified 
with the set { X !->■ cr(x) | x G dom(cr) }. The composition of substitutions 
is defined in the usual way. Thus r o cr is the substitution such that, for all 
terms t, (r o a){t) = T{a{t)). A substitution is said circular if it has the form 
{xi I— >■ X 2 , . . . ,x„_i I— >■ x„,x„ I— >■ xi}. A substitution is in rational solved form 
if it has no circular subset. The set of all substitutions in rational solved form is 
denoted by Subst. 

2.3 Equations 

An equation is of the form s = t where s,t G Tvars- Eqs denotes the set of all 
equations. 

We are concerned in this paper to keep the results on sharing as general as 
possible. In particular, we do not want to restrict ourselves to a specific equality 
theory. Thus we allow for any equality theory T over Tvars that includes the 
basic axioms denoted by the following schemata. 



/ (si ) ■ • ■ ) Sn) 



s = s, 






(1) 


s = t ^ 


t = s, 




(2) 


r = s A s = t = 


^ r = t, 




(3) 


fih,... ,t„) <^= 


Si — t\, . . 


• ? • 


(4) 



Of course, T can include other axioms. For example, it is usual in logic 
programming and most implementations of Prolog to assume an equality theory 
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based on syntactic identity and characterised by the axiom schemata given by 
Clark [3]. This consists of the basic axioms together with the following: 

,Sn) = ,tm) (5) 

Vz G Vars Vt G {Tvars \ Vars) : z G vars{t) ->{z = t). (6) 

The identity axioms characterised by the schemata 5 ensure the equality theory 
is Herbrand and depends only on the syntax. Equality theory for a non-Herbrand 
domain replaces these axioms by ones that depend instead on the semantics of 
the domain. Axioms characterised by the schemata 6 are called the occur-check 
axioms and are an essential part of the standard unification procedure in SLD- 
resolution. 

An alternative approach used in some implementations of Prolog, does not 
require the occur-check axioms. This approach is based on the theory of rational 
trees [5,6]. It assumes the basic axioms and the identity axioms together with 
a set of uniqueness axioms [10, 11]. These state that each equation in rational 
solved form uniquely defines a set of trees. Thus, an equation z = t where 
2 G vars{t) and t G (Tvars \ Vars) denotes the axiom (expressed in terms of the 
usual first-order quantifiers [15]): 

Va; G Vars : (z = t A (x = t{z i-A- x} => z = x)) . 

The basic axioms defined by schemata 1, 2, 3, and 4, which are all that are 
required for the results in this paper, are included in both these theories. 

A substitution <j may be regarded as a set of equations {x = t \ x ^ t & a}. 
A set of equations e G pf(Eqs) is unifiable if there is ct G Subst such that 
T h (cr e). (J is called a unifier for e. a is said to be a relevant unifier of e if 
vars{a) C vars{e). That is, cr does not introduce any new variables, cr is a most 
general unifier for e if, for every unifier a' of e, T h (cr' ct). An mgu, if it 

exists, is unique up to the renaming of variables. In this paper, mgu(e) always 
denotes a relevant unifier of e. 



3 Variable-Idempotence 

It is usual in papers on sharing analysis to assume that all the substitutions 
are idempotent. Note that a substitution ct is idempotent if, for all t G Tvars, 
taa = ta. However, the sharing domain is just concerned with the variables. So, 
to allow for substitutions representing rational trees, we generalise idempotence 
to variable-idempotence. 

Definition 1. A substitution a is variable-idempotent if 
Vt G Tvars '■ vars {taa) = vars{ta). 

The set of all variable-idempotent substitutions is denoted by VSubst. 
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It is convenient to use the following alternative characterisation of variable- 
idempotence: A substitution a is variable-idempotent if and only if, 

V(x I— >■ t) G cr : vars{ta) = vars{t). 



Thus any substitution consisting of a single binding is variable-idempotent. 
Moreover, all idempotent substitutions are also variable-idempotent. 

Example 1. The substitution {x i— >■ /(x)} is not idempotent but is variable- 
idempotent. Also, {x I— f{y),y i— z} is not idempotent or variable-idempotent 
but is equivalent (with respect to some equality theory T) to {x i— f{z),yi-^ z}, 
which is idempotent. 

We define the transformation i — > C Subst x Subst, called 5-transformation, 
as follows: 



(x I— >■ t) G cr (y I— >■ s) G cr X ^ y 
(ct \ {y s}) U {y s[x/t]} 



Any substitution u can be transformed to a variable-idempotent substitution a' 
for cr by a finite sequence of 5-transformations. Furthermore, if the substitutions 
a and a' are regarded as equations, then they are equivalent with respect to any 
equality theory that includes the basic equality axioms. These two statements 
are direct consequences of Lemmas 1 and 2, respectively. 



Lemma 1. Let T be an equality theory that satisfies the basic equality axioms. 
Suppose that a and a' are substitutions such that a cr'. Then, regarding a 
and a' as sets of equations, T h (cr a'). 

Proof. Suppose that (x i— >■ t),(y i— >■ s) G cr where x yf y and suppose also 
a' = (cr \ {y !->• s}) U {y !->■ s[x/t]}. We first show by induction on the depth of 
the term s that 



x = t => s = s[x/t]. 

Suppose s has depth 1. If s is x, then s[x/t] = t and the result is trivial. If s is 
a variable distinct from x or a constant, then s[x/t\ = s and the result follows 
from equality Axiom 1. Suppose now that s = /(si, . . . , s„) and the result holds 
for all terms of depth less than that of s. Then, by the inductive hypothesis, for 
each i = 1, . . . , n. 



x = t Si = Si[x/i\. 

Hence, by Axiom 4, 

x = t ^ /(si, . . . ,s„) = f{si[x/t],. . . ,s„[x/t]) 

and hence 

X = t ^ /(si, . . . ,S„) = /(si, . . . ,Sn)[x/t\. 
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Thus, combining this result with Axiom 3, we have 

{x = t,y = s} {x = t,y = s,s = s[x/t]} 
{x = t,y = s[x/t]}. 

Similarly, combining this result with Axioms 2 and 3, 



{x = t,y = s[x/t]} 



{x = t,y = s[x/t],s = s[x/t]} 
{x = t,y= s}. 



□ 

Note that the condition x y in Lemmal is necessary. For example, suppose 
a = {x 1 -^ f{x)} and cr' = {x i-i- /(/(x))}. Then we do not have a' cr. 

Lemma 2. Suppose that, for each j = 0, . . . , n: 

(7 j — I )■ j , . . . , X-fi I )■ tn,j } 5 

where tjj = tjj-i and if j > 0, for each i = 1, n, where i ^ j, Uj = 
tij-i[xj /tjj-i]. Then, for each j = 0, . . . , n, 

Vj = {xi I— >■ tij, . . . ,Xj tjj} 

is variable-idempotent and, if j > 0, <7j can he obtained from Oj-i by a sequence 
of S -transformations. 

Proof. The proof is by induction on j. Since vq is empty, the base case when 
J = 0 is trivial. Suppose, therefore that 1 < j < n and the hypothesis holds 
for Pj -1 and (Xj-i. By the definition of jy,, we have vj = {xj !—>■ tjj-i} o Vj-i. 
Consider an arbitrary i, 1 < t < j. We will show that varsfti jVj) = varsftij). 

Suppose first that i = j. Then since tj^j = tjj-i, tjj-i = tj^Vj-i and, by 
the inductive hypothesis, vars{tjfii'j-iVj-i) = varsftjflVj-i), we have 

varsftjjVj) = vars{tj^oVj_iVj_i{xj i— >■ tjj}) 

= vars(tjfii7j_i{xj I— >■ tj j}^ 

= vars{tj^j{xj tjj}) 

= varsftjj). 

Suppose now that i ^ j. Then, 

vars{tij) = vars(tij-i{xj i— >■ tjj-i}). 

and, by the inductive hypothesis, vars{ti^-iVj-i) = vars{ti^-i). 

If Xj ^ varsfti j-i), then 

varsftijVj-i) = vars{ti^j-\{xj i— >■ tjj-\\vj-i) 

= uars(Cj_i^’j_i) 

= varsftij). 
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On the other hand, if Xj G vars(ti j_i), then 

vars(tiji/j-i) = vars(tij-i{xj i— >■ 

= vars{tij-ih'j-i) \ {xj} U vars{tjj-iVj-i) 

= vars{tij-i) \ {xj} U vars{tjj-i) 

= vars(tij-i{xj I— >■ tjj-i}) 

= vars{tij). 

Thus, in both cases, 

vars{tijVj) = vars{ti^jVj-i{xj i— >■ tjj-i}) 

= vars(tij{xj I— >■ tjj-i}) 

= vars{tij-i{xj I— >■ tjj-i}{xj i— >■ tjj-i}). 

However, a substitution consisting of a single binding is variable-idempotent. 
Thus 



vars{tiji'j) = vars(tij-i{xj i— >■ tjj-i}) 

= vars{tij). 

Therefore, for each i = 1, . . . , j, varsitiji/j) = vars{tij). It then follows (us- 
ing the alternative characterisation of variable-idempotence) that i^j is variable- 
idempotent. □ 

Example 2. Let 



(Jo ={a;i f{x2),X2 g{xs,X4),X3. H> Xi). 



Then 



(Ti ={xi f{x 2 ),X 2 5(a;3,a:4),a;3 H> f{x 2 )}, 

<J2 ={xi f{g{x3, X4)),X2 g{x3,X4),X3 f {g{x3, X 4 ))} , 

<J3 ={xi f{g{f{g{x3,X4)),X4)),X2 g{f{g{x 3 ,X 4 )),X 4 ),X 3 H> f{g{x3,X4))}. 

Note that a 3 is variable-idempotent and that T h ctq (J3. 

4 Set-Sharing 

4.1 The Sharing Domain 

The Sharing domain is due to Jacobs and Langen [8]. However, we use the defi- 
nition as presented in [1]. 
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Definition 2. (The set-sharing lattice.) Let 

SG { S' G pf ( Vars) \ S^0) 

def 

and let SH = p{SG). The set-sharing lattice is given by the set 

SS = { {sh, U)\shG SH, U G Pf( Pars), VS G s/i : S C U } U {_L, T} 
ordered by defined as follows, for each d, {shi, Ui), (s/ 12 , C/ 2 ) G SS: 

-L d, 

d T , 

(s/ll, C/i) (s/i2, C/2) (C/i = c/2) A (s/ll c s/12). 

It is straightforward to see that every subset of SS has a least upper bound with 
respect to :<ss- Hence SS is a complete lattice.^ 

An element sh of SH abstracts the property of sharing in a substitution a. 
That is, if a is idempotent, two variables x, y must be in the same set in sh 
if some variable, say v occurs in both xa and ya. In fact, this is also true for 
variable-idempotent substitutions although it is shown below that this needs to 
be generalised for substitutions that are not variable-idempotent. Thus, the def- 
inition of the abstraction function a for sharing, requires an ancillary definition 
for the notion of occurrence. 

Definition 3. (Occurrence.) 

For each n G N, occj: Subst x Vors — >■ pf{Vars) is defined for each a G Subst 
and each v G Vars: 

occo(cT,v) {u}, if V = va; 

occo(cr, w) 0, if V ^ va; 

occ„(cr,w) { y G Vars | x G vars{ya) fl occ„_i(cr, u) }, if n > 0. 

It follows that, for fixed values of a and v, occ„(cr, u) is monotonic and extensive 
with respect to the index n. Hence, as the range o/occ„(cr, w) is restricted to the 
finite set of variables in a, there is an I = £{(t,v) G N such that 0CCi{a,v) = 
occ„(cr, u)) for all n> i. Let 

occ!((j, u) occ({a,v). 

Note that if cr is variable-idempotent, then occ!(cr, u) = occi(cr, u). Note also 
that if v va, then occ!(cr, u) = 0. Previous definitions for an occurrence 
operator such as that for sg in [8] have all been for idempotent substitutions. 
However, when a is an idempotent substitution, occ!(cr, w) and sg{a,v) are the 
same for all v G Vars. 

We base the definition of abstraction on the occurrence operator, ocd. 

^ Notice that the only reason we have T G SS is in order to turn SS into a lattice 
rather than a CPO. 




The Correctness of Set-Sharing 107 



Definition 4. (Abstraction.) The concrete domain Subst is related to SS by 
means of the abstraction function a: p(Subst) x pf(Vors) — >■ SS. For each S G 
p(Subst) and each U € p{(Vars), 

a(U,[/)= □ a(a,[/), 

aei: 



where a: Subst x pi{Vars) — >■ SS is defined, for each a € Subst and each U € 
pf(Vars), by 



a{a, U) occ!(cr, f) fl C/ | w G Vars } \ {0}, . 

The following result states that the abstraction for a substitution a is the 
same as the abstraction for a variable-idempotent substitution for a. 

Lemma 3. Let a be a substitution, a' a substitution obtained from a by a se- 
quence of S -transformations, U a set of variables and v G Vars. Then 

V = va v = va', occl{a,v) = occl{a' ,v), and a{a,U) = a{a' ,U). 

Proof. Suppose first that a' is obtained from ct by a single 5-transformation. 
Thus we can assume that x t and y i— >■ s are in a where x G vars{s) and that 

^' = s}) U {y s[x/t]}. 

It follows that, since cr is in rational solved form, cr has no circular subset and 
hence v = va v = va' . Thus, if f yf va, then we have v yf va' and 

occ!(cr, v) = occ!(tr',w) = 0. 

We now assume that v = va = va' and prove that 

ocCm{a,v) C occ!((j^r!). 

The proof is by induction on m. By Definition 3, occo{a,v) = occo{a' ,v) = 
{v}, so that the result holds for m = 0. Suppose then that m > 0 and that 
Vm G ocCm{o’,v). By Definition 3, there exists Vm-i G vars{vm,a) where Vm-i G 
ocCm-i{o',v). Hence, by the inductive hypothesis, Vm-i G occl{a',v). If Vm-i G 
vars{vmO’'), then, by Definition 3, Vm G occ!((j', ?;)) . On the other hand, if 
Vm-i ^ vars(vma'), then Vm = y, Vm-i = x, and x G vars(s) (so that vars(t) C 
vars{s[x /t])). However, by hypothesis, v = va, so that x ^ v and m > I. Thus, 
by Definition 3, there exists Vm -2 G varsft) such that Vm -2 G ocCm- 2 {o’,v). 
By the inductive hypothesis, Vm -2 G occ!(tr',w). Since y i— >■ s[x/t] G a', and 
Vm -2 G vars{s[x/t]), Vm -2 G vars{ya'). Thus, by Definition 3, t/ G occ\{a' ,v). 
Conversely, we now prove that, for all m, 

ocCm(a',v) C occ!(a,v). 

The proof is again by induction on m. As in the previous case, occo(a',v) = 
occo (cr, v) = {v}, so that the result holds for m = 0. Suppose then that m > 0 and 
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that Vjn G 0 CCjn{<j' ,v). By Definition 3, there exists Vm-i G vars{vmO’') where 
Vm-i G 0 CCm-i{cr' , v) . Hence, by the inductive hypothesis, Vm-i G occ!(cr, w). If 
Vm G occ{a,Vm-i), then, by Definition 3, Vm G occ!(cr, u). On the other hand, 
if Vm-i ^ vars{vmO'), then Vm = y, Vm-i G vars(t) and x G vars(s). Thus, as 
y s € a, y € vars{xa). However, since x t € a, Vm-i G vars{xa) so that, 
by Definition 3, a; G occ!((j, u). Thus, again by Definition 3, j/ G occ!(cr, u). 

Thus, if a' is obtained from ct by a single iS-transformation, we have the 
required results: v = va v = va' , occ!(cr, u) = occ!((j', ?;), and a{a,U) = 

a{a' , U). 

Suppose now that there is a sequence a = a\, an = u' such that, for 
z = 2, . . . , n, (Ji is obtained from ai-i by a single 5-step. If n = 1, then a = a'. 
If n > 1, we have by the first part of the proof that, for each z = 2, . . . , n, 
V = VGi-i V = vui, occ! (cTi_ 1 , u) = occl{ai,v), and a{ai-i,U) = a{ai,U), 

and hence the required results. □ 

Example 3. Consider again Example 2. Then 
occi{at),Xi) = {x 2 ,xa}, 

OCC 2 {at),Xi) = {XI,X2,X4}, 

occ3((Jo,a;4) = {xi,X2,x^,Xi} = ocd{ao,Xi), 

and 

occi((J 3 ,a; 4 ) = {xi,X 2 ,x^,Xi} = occ!((T 3 , X 4 ). 

Thus, if V = {xi,X 2 , X 3 , X 4 }, 

Oi(ao,V) = a{a 3 ,V) = X2, X3, X4}}. 

4.2 Abstract Operations for Sharing Sets 

We are concerned in this paper in establishing results for the abstract operation 
aunify which is defined for arbitrary sets of equations. However, by building the 
definition of aunify in three steps via the definitions of amgu (for sharing sets) 
and Amgu (for sharing domains) and stating corresponding results for each of 
them, we provide an outline for the overall method of proof for the aunify results. 
Details of all proofs are available in [7]. 

In order to define the abstract operation amgu we need some ancillary defi- 
nitions. 

Definition 5. (Auxiliary functions.) The closure under union /zznctzozz (also 
called star-union^, (•)*: SH — >■ SH , is, for each sh G SH , 

sh*= {SGSG\3n>1.3Ti,... ,TnGsh.S = TiU---UTn}. 

For each sh G SH and each T G pf{Vars), the extraction of the relevant compo- 
nent of sh with respect to T is encoded by the function rel: pf( Vars) x SH — >■ SH 
defined as 

rel(T, sh) = {SGsh\SDTy^ 0 }. 
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For each shi, s /12 G SH , the binary union function bin: SH x SH — >■ SH is given 
by 



bin(s/ii, s/12) = { U 52 I S'! G s/ii, S2 G s/12 }. 

The function proj : SH x pf(Vars) — >■ SH projects an element of SH onto a set 
of variables of interest: if sh G SH and V G pf(Vars), then 

pioj{sh,V) =‘'{5ntG I 5-g 0}. 

Definition 6. (amgu.) The function amgu captures the effects of a binding x 1 — >■ 
t on an SH element. Let x be a variable and t a term. Let also sh G SH and 

A rel({a;}, s/i) , 

B rel(uars(t), s/i). 



Then 



amgu(s/i, X !->■ t) (s/i \ (^ U Bf) U bin(Gl*, B*). 

Then we have the following soundness result for amgu. 

Lemma 4. Let {sh, U) G SS and {x 1 — >■ t}, a,v € Subst such that v is a relevant 
unifier of {xa = ta} and vars{x),vars{t),vars{a) C U. Then 

oi{(J,U) <ss {sh,U) a{v o a,U) <ss {3xn.g\x{sh,x t),U). 

To prove this, observe that, by Lemma 2, if a is not variable-idempotent, it 
can be transformed to a variable-idempotent substitution a' . Hence, by Lemma 3, 
a{a,U) = a{a',U). Therefore, the proof, which is given in [7], deals primarily 
with the case when cr is variable-idempotent. 

Since a relevant unifier of e is a relevant unifier of any other set e' equivalent to 
e wrt to the equality theory T, this lemma shows that it is safe for the analyser 
to perform part or all of the concrete unification algorithm before computing 
amgu. 

The following lemmas, proved in [7], show that amgu is commutative and 
idempotent. 

Lemma 5. Let sh G SH and {x 1 — r} G Subst. Then 

amgu(s/i,x I— >■ r) = amgu(amgu(s/i, x 1— r) , x 1— >■ r) . 

Lemma 6. Let sh G SH and {x 1 -^ r} , {y 1 -^ t} G Subst. Then 

amgu(amgu(s/i, X 1— r) , j/ 1— >■ t) = amgu(amgu(s/i, j/ 1— >■ /) , x 1— >■ r) . 
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4.3 Abstract Operations for Sharing Domains 

The definitions and results of Subsection 4.2 can be lifted to apply to sharing 
domains. 

Definition 7. (Amgu.) The operation Amgu: SS x Suhst — >■ SS extends the 
SS description it takes as an argument, to the set of variables occurring in the 
binding it is given as the second argument. Then it applies amgu.' 

Amgu((s/i, U),x !->■ t) 

^amgu^s/i U { {u} | u £ varsix i-^t)\U},xi-£t^,UU varsix i— >■ f) 

The results for amgu can easily be extended to apply to Amgu. 

Definition 8. (aunify.) The function arniyfy : 5'5'xEqs — >■ SS generalises Amga 
to a set of equations e: If {sh, U) G SS , x is a variable, r is a term, s = 
/(si, . . . , s„) and t = /(ti, . . . , tn) are non-variable terms, and s = t denote the 
set of equations {si = ti, . . . , Sn = tn}, then 

aunify((s/i, [/), 0) = {sh,U), 

if e £ pf(Eqs) is unifiable, 

aunify((s/i, U),eVJ {x = r}) aunify (Amgu (s/i, U) , x ^ r) , e \ {x = r}) , 

aunify((s/i, U),e U {s = x}) aunify ((s/i, U), (e \ {s = x}) U {x = s}), 

aunify((s/i, U),e U {s = t}) aunify ((s/i, U), {e \ {s = t}) Us = T) , 

and, if e is not unifiable, 

aunify((s/i, [/), e) T. 

For the distinguished elements T and T of SS 

aunify (T, e) T, aunify (T, e) T. 

As a consequence of this and the generalisation of Lemmas 4, 5 and 6 to 
Amgu, we have the following soundness, commutativity and idempotence results 
required for aunify to be sound and well-defined. As before, the proofs of these 
results are in [7]. 

Theorem 1. Let {sh,U) £ SS , a,v £ Subst, and e £ pf(Eqs) be such that 
vars{a) C U and v is a relevant unifier of e. Then 

oi{(J,U) <ss {sh,U) a{v o (T,U) <ss 3Ms.\iy{{sh,U),e). 




The Correctness of Set-Sharing 111 



Theorem 2. Let (sh,U) € SS and e G pf(Eqs). Then 

aunify((s/i, U), e) = aunify^aunify((s/i, C/), e), 
Theorem 3. Let {sh,U) G SS and e\,e 2 G pf(Eqs). Then 

aunify ^aunify ((s/i, [/), ei) , 62^ = aunify ^aunify ((s/i, [/), 62) , 



5 Discussion 

The SS domain which was first defined by Langen [14] and published by Jacobs 
and Langen [8] is an important domain for sharing analysis. In this paper, we 
have provided a framework for analysing non-idempotent substitutions and pre- 
sented results for soundness, idempotence and commutativity of aunify. In fact, 
most researchers concerned with analysing sharing and related properties using 
the SS domain, assume these properties hold. Why therefore are the results in 
this paper necessary? Let us consider each of the above properties one at a time. 



5.1 Soundness 

We have shown that, for any substitution a over a set of variables U, the abstrac- 
tion a{a,U) = {sh,U) is unique (Lemma 3) and the aunify operation is sound 
(Theorem 1). Note that, in Theorem 1, there are no restrictions on cr; it can be 
non-idempotent, possibly including cyclic bindings (that is, bindings where the 
domain variable occurs in its co-domain). Thus this result is widely applicable. 

Previous results on sharing have assumed that substitutions are idempotent. 
This is true if equality is syntactic identity and the implementation uses a unifi- 
cation algorithm based on that of Robinson [17] which includes the occur-check. 
With such algorithms, the resulting unifier is both unique and idempotent. Un- 
fortunately, this is not what is implemented by most Prolog systems. 

In particular, if the algorithm is as described in [11] and used in Prolog 
III [5], then the resulting unifier is in rational solved form. This algorithm does 
not generate idempotent or even variable-idempotent substitutions even when 
the occur-check would never have succeeded. However, it has been shown that the 
substitution obtained in this way uniquely defines a system of rational trees [5] . 
Thus our results show that its abstraction using a, as defined in this paper, is 
also unique and that aunify is sound. 

Alternatively, if, as in most commercial Prolog systems, the unification algo- 
rithm is based on the Martelli-Montanari algorithm, but omits the occur check 
step, then the resulting substitution may not be idempotent. Consider the fol- 
lowing example. 

Suppose we are given as input the equation p{z, f{x, y)) = p{f{z, y), z) with 
an initial substitution that is empty. We apply the steps in Martelli-Montanari 
procedure but without the occur-check: 
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equations 

1 p{z,f{x,y))=p{f{z,y),z) 

2 z = f{z,y),f{x,y) = z 

3 f{x,y) = f{z,y) 

4 x = z,y = y 

5 y = y 

6 0 



substitution 

0 

0 

{z f{z,y)} 

{z f{z,y)} 

{z f{z,y),x^ z} 

{z f{z,y),x^ z} 



Note that we have used three kinds of steps here. In lines 1 and 3, neither 
argument of the selected equation is a variable. In this case, the outer non- 
variable symbols (when, as in this example, they are the same) are removed 
and new equations are formed between the corresponding arguments. In lines 
2 and 4, the selected equation has the form v = t, where v is a variable and 
t is not identical to v, then every occurrence of v is replaced by t in all the 
remaining equations and the range of the substitution, u i— >■ t is then added to 
the substitution. In line 5, the identity is removed. 

Let a = {z f{z,y),x i— z}, be the computed substitution. Then, we have 



vars{xa) = vars{z) = {z}, 
vars{x(T^) = vars{f{z,y)) = {y,z}. 



Hence cr is not variable-idempotent. 

We conjecture that the resulting substitution is still unique (up to variable 
renaming). In this case our results can be applied so that its abstraction using 
Of, as defined in this paper, is also unique and aunify is sound. 



5.2 Idempotence 

Definition 8 defines aunify inductively over a set of equations, so that it is im- 
portant for this definition that aunify is both idempotent and commutative. 

The only previous result concerning the idempotence of aunify is given in 
thesis of Langen [14, Theorem 32]. However, the definition of aunify in [14] 
includes the renaming and projection operations and, in this case, only a weak 
form of idempotence holds. In fact, for the basic aunify operation as defined 
here and without projection and renaming, idempotence has never before been 
proven. 



5.3 Commutativity 

In the thesis of Langen the “proof” of commutativity of amguhas a number of 
omissions and errors [14, Lemma 30]. We highlight here, one error which we were 
unable to correct in the context of the given proof. 

To make it easier to compare, we adapt our notation and, define amge only 
in the case that a is a variable: 

amge(a, &, sh) amgu(s/i, b). 
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To prove the lemma, it has to show that: 

amge(o2, 62 amge(ai, 61, sh)) = amge(oi, 61, amge(a2, 62, sh)). 

holds when oi and 02 are variables. This corresponds to “the second base case” 
of the proof. We use Langen’s terminology: 

— A set of variables A is at a term t iff var(f) n X ^ 0 . 

— A set of variables A is at i iff A is at or bi. 

— A union A A is of Type z iff A is at and Y is at bi. 

Let Ihs amge(o2, &2, amge(ai, 61, S')), and rhs amge(ai, 61, amge(a2, 62, S)). 
Let also Z G Ihs and T aunify(ai, &i, S). Consider the case when 

Z = A U2 A where A G rel(a2,T), A G rel(627 T), 

A = {7 Ui A where U G rel(ai, s/i), A G rel(6i, sh) 

and U n (vars{a2) U vars{b2)) = 0 (that is, U is not at 2 ). Then the following 
quote [ 14 , page 53 , line 23 ] applies: 

In this case {U Ui A) U2 A = U Ui (A U2 A). By the inductive assumption 
A U2 A is in the rhs and therefore so is Z. 

We give a counter-example to the statement “A U2 A is in the rhs”. 

Suppose 01,61,02,62 are variables. We let each of 01,61,02,62 denote both 
the actual variable and the singleton set containing that variable. Suppose sh = 
{oi, 6 i 02, 62}. Then, from the definition of amge, 

Ihs = {0161O262}, rhs = {0161O262}, T = {016102,62}. 

Let Z = 0161O262, A = 0161O2, A = 62, U = ai, V = 61O2. All the above 
conditions. However A U2 A = 61O262 and this is not in {0161O262}. 
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Abstract. We consider specifications of analysers expressed as com- 
positions of two functions: a semantic function, which returns a natural 
semantics derivation tree, and a property defined by recurrence on deriva- 
tion trees. A recursive definition of a dynamic analyser can be obtained 
by fold/unfold program transformation combined with deforestation. A 
static analyser can then be derived by abstract interpretation of the dy- 
namic analyser. We apply our framework to the derivation of a dynamic 
backward slicing analysis for a logic programming language. 



1 Introduction 

A large amount of work has been devoted to program analysis during the last 
two decades, both on the practical side and on the theoretical issues. However, 
most of the program analysers that have been implemented or reported in the 
literature so far are concerned with one specific property, one specific language 
and one specific service (dynamic or static) . A few generic tools have been pro- 
posed but they are generally restricted to one class of properties or languages, 
or limited in their level of abstraction. We believe that there is a strong need for 
environments supporting the design of program analysers and that more effort 
should be put on the software engineering of analysers. 

We present a framework for designing analysers from operational specifica- 
tions by program transformation (folding/unfolding). The analysis specification 
has two components: a semantics of the programming language and a definition 
of the property. 

The advantage of this two-fold specification is that the definition of the prop- 
erty can be kept separate from the semantics of the programming language. Ide- 
ally, properties can be specified in terms of the derivation tree of the operational 
semantics. Specific analysers can then be obtained systematically by instantiat- 
ing semantics of the programming language. We focus here on slicing of logic 
programs. The general approach is detailed in M- 

Natural semantics izm are a good starting point for the definition of anal- 
yses because they are both structural (compositional) and intensional. They 
are structural because the semantics of a phrase in the programming language 

G. Levi (Ed.): SAS’98, LNCS 1503, pp. 115- im 1998. 

(c) Springer- Verlag Berlin Heidelberg 1998 
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is derived from the semantics of subphrases; they are intensional because the 
derivation tree that is associated with a phrase in the programming language 
contains the intermediate results (the semantics of subphrases) . These qualities 
are significant in the context of program analysis because compositionality leads 
to more tractable proof techniques and intensionality makes it easier to establish 
the connection between the result of the analysis and its intended use. 

Our semantics is defined formally as a function taking a term and an evalu- 
ation context and returning a derivation tree. The property itself is a function 
from derivation trees to a suitable abstract domain. The composition of these 
two functions defines a dynamic a posteriori analysis. It represents a function 
which initially calculates the trace of a complete execution (a derivation tree) of 
a program before extracting the required property. Program transformations via 
extended folding/unfolding techniques and simplification rules allow to obtain a 
recursive definition of the dynamic analyser (which does not call the property 
function) . This function is in fact a dynamic on the fly analyser in the sense that 
it calculates the required property progressively during program execution. The 
following diagram shows the general organisation: 



Context X Term 



Semantics 




Tree 

Property 



Result 

The key points of the approach proposed here are the following: 



— The derivation is achieved in a systematic way by using functional transfor- 
mations: unfolding and folding. 

— It is applicable to a wide variety of languages and properties because it is 
based on natural semantics definitions. 



As mentioned before, some of the analyses that we want to specify are dy- 
namic and others are static. There is no real reason why these two categories of 
analyses should be seen as belonging to different worlds. In the paper we focus 
on dynamic analysis, considering that static analysis can be obtained in a second 
stage as an abstract interpretation of the dynamic analysis as presented in m- 
We outline this derivation in the conclusion. Note that our approach introduces 
a clear separation between the specification of an analysis (defined as a property 
on semantics derivation trees) and the algorithm that implements it. 



We illustrate the framework by the formal derivation of a slicing analysis for 
a logic programming language. The different stages of the derivation are detailed 
in the following sections. Section O introduces contexts, terms, derivation trees 
and the semantics function. The abstract domain and the property function 
are presented in section El The transformation of the composition of the two 
specification functions (the semantics and the property) into a dynamic on the 
fly analyser is described in section 0 Related work, conclusion and avenues for 
further research are discussed in section El and section El respectively. 
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2 Natural Semantics 



The natural semantics of a language is a set of axioms and inference rules that 
define a relation between a context, a term in the programming language and a 
result. A natural semantics derivation tree has the form: 



Proof- Tree = [RN] 



Proof- Treei ... Proof- Tree^ 
STT 



where RN is the name of the rule used to derive STT. The conclusion STT is a 
statement, that is to say a triple consisting of a context, a term and a result. 

Let C be the set of contexts, T the type of terms of the language and PT the 
type of derivation trees, we have: 

PT = STT X {list PT) X RN 
STT = C X T X NF 
T = PP X I 



Derivation trees are made of a statement (the conclusion), a list of derivation 
trees (the premises) and the name of rule applied to derive the conclusion. We 
assume that a term is a pair of a program point and an expression. STT denotes 
the type of statements, RN rule names, NF normal forms (program results), PP 
program points and I expressions. 



The Semantics of a Logic Programming Language 

We assume a program Prog which is a collection of predicate definitions of the 
form [Pk{xi, ...,Xn) = Bk]. The body Bk of a predicate is in normal form and it 
contains only variables from {x\, ...,x„}. Normal forms are first order formulae 
(also called “goal formulae” in [IS|) built up from predicate applications using 
only the connectives “and”, “or”, and “there exists”. Their syntax is defined by: 

/ ::= 0p(a;i,a:2,a:3) \ x = t \ Ui MJ 2 \ Ui\J U 2 \ ^x.Ui \ Pk{yi, ■■■,yn) 

where Dp stands for basic predicate^ and Pk for user-defined predicates. Ut are 
terms of type T. We assume that each variable x occurring in a term 3x.U\ is 
unique. In a program, each subterm in this syntax is associated with a program 
point (using pairs). 

As an illustration of this syntax. Figure Q presents a small program in a logic 
programming syntax and shows its translation into normal form. Program points 
are represented by tt^. Note that some program points are omitted for the sake 
of readability. The program defines two predicates P and Q. The main predicate 
is Q. The recursive predicate P computes the length n of the list I of integers, 
the sum sum of the elements of the list, the maximun max and the minimum 
min of the list 1. The average av of the list is computed by the predicate Q via 
P (the value of sum obtained by P is divided by the length n of the list). 

^ We consider only ternary basic predicates here, but other arities are treated in the 
same way. 
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Definition of the program in a logic programming syntax 

P{nil, 1, 0, 0, 0) 

P{{x, nil), 1, X, X, x) 

P{{x,xs),n,sum,max,min) = {tti, P{xs,n' , sum' , max' , min')) 

\ti 2 , Ad{n' , l,n)) 

(tts, Ad{sum' , X, sum)) 

(7T4, Max(max' , x, max)) 

(tts, Min(min' ,x, min)) 

Q{1, av, max, min) = (yre, P{1, n, sum, max, min)) 

(ttt, Div{sum, n, av)) 

Normal form of the program 

P{1, n, sum, max, min) = 

{{I = nil) A (n = 1) A {sum = 0) A {max = 0) A {min = 0))V 
(3a;. {I = {x, nil)) A (n = 1) A {sum = x) A {max = x) A {min = x))V 
{3x. 3xs. 3n'. 3sum'. 3max' . 3mm'. I = {x,xs)A 
(tti, P{xs, n! , sum' , max' , min')) A 
{iT 2 ,Ad{n' , 1, n))A 
{-K 3 , Ad{sum' , X, sum))A 
(7T4, M ax {max' , x, max))A 
{tts, Min{min' , x, min))) 

Q{1, av,max,min) = (3n. 3sum. 

(yre, P{1, n, sum, max, min))A 
{tv 7 , Div{sum, n, av))) 



Fig. 1. A simple logic program 



Following ca, we assume an infinite set of program variables Pvar and an 
infinite set of renaming variables Rvar. Terms and substitutions are constructed 
using program variables and renaming variables. We distinguish two kinds of sub- 
stitutions: program variable substitutions (Subst) whose domain and co-domain 
are subsets of Pvar and Rvar respectively, and renaming variable substitutions 
(Rsubst) whose domain and co-domain are subsets of Rvar: 

Subst = Pvar ^ Rterm 
Rsubst = Rvar Rterm 

where Rterm represents a term constructed with renaming variables Rvar. By 
convention, we use 9 G Subst for a program variable substitution and a € Rsubst 
for a renaming variable substitution. The definition of substitution composition 
is modified to take account the role held by renaming variables. The modification 
occurs when 6 G Subst and a € Rsubst, we have a o 9 € Subst defined by: 

dom{a o 6) = dom{6) 

{a o 0){x) = a{9{x)) for all x G dom{6) 
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The domain of contexts for this language is defined by C = Tree(Subst) 
where Tree(iJ) is the type of binary trees with leaves of type H. We define 
contexts as binary trees of substitutions to take into account the non determin- 
istic nature of the language. So, we gather in one derivation the computation of 
all the substitutions of a program. A particular control strategy for the imple- 
mentation of the language corresponds to a particular ordering of the leaves of 
substitutions trees. For instance, the list of results of the usual depth-first evalu- 
ation strategy of Prolog is precisely the leaves of the substitution tree produced 
by our semantics ordered from left to right. We write N(Ti,T 2 ) for a tree with 
subtrees T\ and T 2 . 



[Op] C {TT,0p{xi,X2,X3)) Op{C, Xl,X2,X3) 

[Eq] C {tt,x = t) ^ unif (C, x, t) 



C'rUi->Ri Ri'rU2->R2 
C V- (tt, c/i A U 2 ) R 2 



C'rUi-> Ri C'rU 2 -> R 2 
C h (tt, Ui V U 2 ) union (C, Ri, R 2 ) 



Add (C, X, rx) \~ Ih ^ R\ 

C h (tt, 3x.U\) Drop {Ri, x) 



rx e Rvar fresh variable 



ReukiC) \- Bk ^ Ri 
C h {'K,Pk{yi, ■■■,yn)) Extk{C,Ri) 



with [Pk{xi, ...,x„) = Bk] e Pror 



^{N{Ti,T 2 ),xi, . . . ,x„) = N{F{Ti,xi, . . . ,x„), F{T 2 , xi, . . . ,x„)) 

F{6,xi, . . . ,x„) = F{9,xi, x„) 

op{9, xi, X2, X3) = let a = [(0(xi) op 9{x2)) /9{x3)] ±n a o 9 

if 9{xi) and 9{x2) are ground and 9{x3) € Rvar, _L otherwise 
unif{9,x,t) = let a = mgu(0(x), 6(f)) In a o 9 if 9{x) and 9{t) can be unified , 

_L otherwise 

union{N{T\,T 2 ), N{Ui,U 2 ), N{Vi,V 2 )) = N{union(Ti,Ui,Vi), union{T 2 ,U 2 ,V 2 )) 
union{9, U, V) = N{U, V) 

Add{9,pv,rv) = 9[rv/pv\ with u 7 ^ pu => 9[rv/pv]{v) = 9{v) and 9[rv / pv\{pv) = rv 
Drop{9,pv) = with v ^ pv ^ 9/py(v) = 9(v) and 9/p^{pv) =_L 
Renk{9) = [9{yi)/xi\ 

Fxtk{9,9') — a o 9 with 9' = a o [9{yi)/xi] 



Fig. 2. Natural semantics of a logic programming language 
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The natural semantics of a simple logic programming language using the 
usual inference rule presentation is presented in Figure El The normal forms 
calculated by the rules are contexts. In the figure, F{T) denotes the application 
of a function F to all the substitutions of a tree T and its result is also a tree. 
The function op represents the interpretation of operator Dp. The substitution 
unif{6,x,t) of Subst is defined for the unification of x and t via 9 (rule Eq). 
The rule A is not surprising, the first formula Ui of the conjonction is evaluated 
and the result i?i is taken as the new context for the evaluation of the second 
formula U2 of the conjonction; the result R2 is the final result. For the rule V, 
the subtrees corresponding to the sub- formulae of the disjonction are evaluated 
independently. The function union(Ti, T2, T3) is needed to build a new substitu- 
tion tree joining the trees T2 and T3 produced by two subgoals. Its first argument 
is the initial substitution, which is used to identify the points where the joins 
have to be introduced (these points are the leaves of Ti ) . The argument 0 as the 
initial substitution can be ignored because substitutions are added to contexts, 
generating new contexts. The rule 3 uses two functions Add and Drop. Add 
is used to add a program variable in a substitution (the new program variable 
is attached to a free renaming variable) and Drop removes a variable from a 
substitution. 

For the rule Call, two definitions of substitutions are needed. Rerik{C) cre- 
ates a new substitution to execute the body of a clause (it amounts to a variable 
renaming) because the body Bk of a clause contains formal parameters Xi and 
C contains program variables pi. Extk{C,R{) propagates the result of a predi- 
cate in the calling substitutions because C contains variables yt and i?i contains 
formal parameters Xi . From the definition of Rerik , we see that the body Bk of a 
predicate is evaluated in an environment defining exactly the formal parameters 
of the predicate Pk ■ 

The formal definitions of the functions introduced informally before are pre- 
sented in the bottom of Figure El 

In order to make formal manipulations easier, we express the construction 
of natural semantics derivation trees in a functional framework. The semantic 
function S' is a partial function of type: 

C X T ^ PT 

The important issue about the type of the semantic function is that it returns 
the whole natural semantics derivation tree, rather than just the result of the 
program. This choice makes it easier to define intensional analyses. The fact that 
we describe the semantics in a functional framework does not prevent us from 
dealing with non deterministic languages, as we show for a logic programming 
language. This is because we can use NF and C to represent sets of possible results 
and contexts. 

We use the notation X.ty to denote the field of type TY of X. For example, 
we will make intensive use of the following expressions in the rest of the paper: 
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type 


meaning 


PT.stt 


STT 


conclusion of PT 


PT.lpt 


{list PT) 


premisses of PT 


PT.rn 


RN 


name of the rule used at the root of PT 


PT. stt.c 


C 


context of the conclusion sequent of PT 


PT. stt.t.i 


I 


term of the conclusion sequent of PT 


PT. stt.t.pp 


PP 


program point of the conclusion sequent of PT 


PT. stt.nf 


NF 


normal form of the conclusion sequent of PT 



The semantics in functional form is presented in Figure 0 The semantics 
function of Figure |3 takes two arguments (the context C and the term T) 
and it returns a derivation tree. The derivation tree contains the conclusion 
(C, T, i?, E)) of type STT, where is the result of the program in func- 

tional form, the list of subtrees and the name k of the rule used to derive the 
conclusion. The body of the function is a list of cases selected by pattern match- 
ing on the form of the term. The function is defined by recurrence on the term. 
The set of definitions Prog is used as an implicit parameter of the semantics. 



S (C, T) = case T of 

(tt, Op (a;i, X2, 2:3)) : {{C, T, op{C, xi, X2, X3)), nil, Op) 



(tt, Eq(a:, t)) : ((C, T, unif {C, x, t)), nil, Eq) 

(7T,And {Ui,U2)). let PTi = S{C,Ui) 

Ri = PTi.stt.nf 
PT 2 = S(Ri,U2) 

R2 = PT2.stt.nf 
in ((C,T,P2 ),[PTi,PT2],A) 

(7r,0r (Ui,U2)): let PTi = S (C,Ui) 

Pi = PTi.stt.nf 
PT2 = S(C,U2) 

P2 = PT2.stt.nf 

in ((C, T, union (C, Pi, P2)), [PTi, PT2], V) 

(tt, Exists {x,Ui)) : let PTi = S {Add (C,x,rx),Ui) 

Pi = PTi.stt.nf 
in {{C,T,Drop {Ri,x)),[PTi],3) 

(tt. Call (Pfc(yi, ... ,y„))) : let PTi = S (Renk{C), Bk) 

Pi = PTi.stt.nf 

in ((C,T,Stfc(C,Pi)), [PTi],Call) 
with [Pk{xi, ...,Xn) = Bk] e Prog 



Fig. 3 . The semantics function of a logic programming language 
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3 Specification of a Slicing Property 

Slicing a program consists in constructing a reduced version of the program 
(called a program slice) containing only those statements that affect a given set 
of variables at given program points (this set is called the slicing criterion). In 
program debugging, slicing makes it possible for a software engineer to focus on 
the relevant parts of the code. Slicing is also useful for testing, program under- 
standing and in maintenance activities. Because of this diversity of applications, 
different variations on the notion of slicing have been proposed, as well as a 
number of methods to compute slices. First, a program slice can either be ea;e- 
cutable or not. Producing an executable slice makes it possible to apply further 
treatments to the result of the analysis. Another important distinction is be- 
tween static and dynamic slicing. In the first case, the slice is computed without 
any assumption on the inputs, whereas the latter relies on some specific input 
data. Slicing algorithms can also be distinguished by their direction. Backward 
slicing identifies the statements of a program that may have some impact on 
the criterion whereas forward slicing returns the statements which may be in- 
fluenced by the criterion. In this paper, we consider dynamic backward slicing 
with executable slices. Static slicing algorithms can be derived by abstract in- 
terpretation of dynamic slicing analysers ; this construction is sketched in the 
conclusion. We can describe forward slicing analysers in a similar way but slic- 
ing analyses producing non executable slices do not fit well into our framework 
since the specification of the analysis is a relation between the semantics of the 
original program and the semantics of the slice as presented in HU! . 

Slicing was originally proposed by Weiser for imperative languages m and 
its application to logic programming 1231 and functional programming HHi have 
been studied recently. In fact, the concept of slicing itself is very general: it is 
not tied to one specific style of programmin and it can lead to dynamic as well 
as static analysers |2S!. 

A slicing analysis for a logic programming language (with programs in normal 
form) according to a program point and a set of variables of interest consists 
in keeping only the sub-goals of disjunctions of each clause (a clause defines a 
predicate) being able to affect the value of the variables of interest. If all sub- 
goals of a formula of the disjunction are dropped, then this formula is dropped. If 
all formulae of the disjunction of goals are dropped, then the clause is dropped. 
In the opposite case, the head of the clause defining the predicate is kept. 

Let us take the program in normal form of Figure ^ to illustrate dynamic 
backward slicing. We assume that we are interested only in the value of the 
variable av at the program point iry. The pair {(717,0?;)} is called the slicing 
criterion. The dynamic slice of the program is extracted for one particular input. 
For instance, if we execute the predicate Q with nil as the initial value of I, we 
get: 

^ More precisely “backward slicing” . 

® Even if the details of the resulting analyses are of course. 
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P{nil, 1, 0, 0, 0) 

Q{1, av, max, min) = (ttb, P{1, n, sum, max, min)) 

(nr, Div(sum, n, av)) 

The predicate P is not recursively called and the first disjunctive part is satisfied, 
the third clause of P is never executed. The definition of the predicate Q is kept 
because all its clauses are useful to compute the variable av. If we consider the 
execution of the program with (2, (3, nil)) as initial value of I, we recursively call 
the predicate P, we get: 

P({x, nil), 1, X, X, x) 

P((x, xs),n, sum, max, min) = (tti, P(xs, n! , sum' , max' , min')) 

(n 2 , Ad(n' , l,n)) 

(tts, Ad(sum' , X, sum)) 

Q(l, av, max, min) = (ne, P(l, n, sum, max, min)) 

(nr, Div(sum, n, av)) 

Only a part of the third clause of the predicate P is kept, the program points 
7T4 and 7T5 are dropped because they are not useful in computing av (they are 
needed to compute the values for max' and min'). 

Assuming a set of pairs (tt^, Vi), where is a program point and Vi a variable, 
a backward slicing analysis produces the slice computing for each point the 
same values as the initial program for the variable Vi. 

In our framework, a property is expressed by a function which takes at least 
an argument being the co-domain of the semantics function (a derivation tree 
of type PT) and the result of the property is an abstract domain. The slicing 
property takes an additional argument to represent the slicing criterion (of type 
PP ^ T^(Pvar)) and the type of the result is 'P(PP) because slices are represented 
by sets of program points. The slicing criterion is represented in our approach 
by the mapping from program points to relevant variables. Because of the slicing 
property, we need extra information. We introduce a set of variables of interest 
according to a program point (this set represents the value of variable that must 
be preserved for computing the corresponding term) . The initial value of the set 
is 0. The property propagates this information of type T^(Pvar) and finally the 
type of the property is: 

ttsi : PT X (PP — > 'P(Pvar)) x T^(Pvar) — > 7^(PP) x T^(Pvar) 

The slicing property asi for the logic programming language is presented 
in Figure E] The property takes as arguments a derivation tree PT, plus two 
additional parameters RV G PP ^ T^(Pvar) and D G T^(Pvar). The second 
argument RV (for Relevant Variables) is the slicing criterion mentioned above. A 
program point tt associated with a non-empty set RV (tt) is called an observation 
point. The third argument D of the property represents the set of variables whose 
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values must be preserved in the output context^ (normal form) of the term, i.e. 
the set of variables that must be preserved in the result Ri of the evaluation of 
the derivation tree PR. In the initial call, D is the empty set. The function a si 
is called recursively on the intermediate derivation trees (PR) of the natural 
semantics and sets of observation variables. 

The result of the property is a pair (5, N) with S G 7^(PP) and N G T^(Pvar). 
S is the set of program points of the term T that must be kept in the slice and 
N is the set of variables whose value must be preserved in the input context0. A 
program point must be kept in the slice if it can influence an observation point 
or the value of a variable of D in the output context. The same condition applies 
to decide which variables must be preserved in the input context. If the program 
point can be removed from the slice, the result of the property is (0, D), which 
means that no program point is added to the slice and the variables whose values 
must be preserved in the input context are the variables that are necessary in 
the output context. Otherwise, the first component of the result of the property 
is U {tt} because tt has to be added to the program points collected in the 

i 

subterms of T. The second component N of the result is the set of variables 
whose value must be preserved in the input context C . It contains at least the 
set D and the variables RV{'k) of slicing criterion, thus we factorise that by 
setting D' = Du RV (tt) in beginning of the slicing definition. 

We assume that the definitions of Si and JVi are not mutually recursive. The 
definition of the sets of observation variables (third argument of a si ) do not use 
Nj, j > i. Note that this is a characteristic feature of a backward analysis. 

In Figure 0, the relation Indep{C, Di^D^) is used to ensure that two sets of 
variables D\ and D 2 are independent, which is the case when they do not share 
any renaming variables (in any substitution of the context C). The relation Indep 
appears in the first two cases as a necessary condition to exclude the term from 
the slice. If the relation holds, then the (renaming variable) substitution resulting 
from the evaluation of the term cannot have any impact on the variables of D. 
The relation UF{C, x, t) is satisfied if the unification of x and t cannot fail for 
any substitution of C. It is a prerequisite for excluding Eq(x, t) from the slice 
because a failure is recorded in the substitution tree as the T substitutioijl; 
as a consequence, it has an impact on all the variables. This condition was 
not included in the Op case, assuming that the logic programming language is 
equipped with mode annotations ensuring that operators are always called with 
their first two arguments ground and the last one freeQ. In both the Dp and 
the Eq cases, the set of necessary variables (at the input of the program point) 
is D' added to all the program variables of the term: the set {xi, 2 : 2 , CC 3 } for 

^ For a forward property this argument would characterise the input context rather 
than the output context. 

® For a forward property this argument would characterise the output context rather 
than the input context. 

® Note that T is an absorbing element for the semantics of the language. For instance 
op(T, ii, 12, *3) =T and unif{±,x,t) =T. 

^ Otherwise an extra condition based on UF can be added as in the Eq case. 
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Op (xi,X 2 ,X 3 ) and the set of program variables occurring in t increased with x 
for the rule Eq (x,t). The formal definitions of Indep and UF are presented in 
the bottom of Figure 2| 

For the rule And, both branches are processed in turn (the second branch first 
since our property is computed in a backward direction). The property is first 
called with PT 2 and D' and the result is (52, iV 2 ); then the property is computed 
with PTi and N 2 , we have (5i, A^i) as the result. The program point tt can be 
removed from the slice when both S\ and S 2 are empty sets. When the program 
point is kept, the result of the operator And is then (5i U ^2 U {7r},iVi) because 
the information about program points of both branches is kept and the set A^i 
represents the variables must be preserved in the input context since we consider 
a backward direction. 

The treatment of Or is different: the term is systematically kept in the slice be- 
cause it always influences the values of all the variables (through the introduction 
of subtrees in the derivation tree). Both branches are computed independently 
and the result gathers the information of these two branches. 

The rules for Exists and Call are not surprising. We assume that the variable 
x in Exists(a;, Ui) is unique in a normalised program; so x can be removed from 
the set of necessary variables yielded by the analysis of U\ (hence N\ — {a;}). 

In the rule for Call, first the derivation tree corresponding to the predicate Pk 
is computed with the set | ^Indep{C, D' , {j/i})} of variables to be preserved 
(i. e. the formal parameters Xi of Pk bounded to arguments yi which are not 
independent from the set D'). The test in the rule for Call is similar to the test 
in the Op case. We could make more sophisticated choices to avoid including all 
the variables yi-..,yn in the set of the necessary variables. 

4 Derivation of the Dynamic on the Fly Analyser 

We have presented in section 0 the semantics function S and the property asi 
in functional form in section 0 The general organisation is described by the 
following diagram: 

^ — s 

C X T ^ PT 




The composition of the property Usi and the semantics 5 is a function of type 
C X T ^ Dq, where is the domain of abstract values, the result of the analysis. 
This function computes successively the derivation tree related to a program, 
then the property of interest for this tree. It corresponds to a dynamic analysis 
a posteriori that inspects the trace produced after the program execution. It is 
interesting to formally describe dynamic analysers, because they are useful for 
instrumentation or debugging. We could also prefer dynamic analyses which, 
calculate their result on the fly i.e. during program execution. Their advantage 
is that they do not have to memorise traces before analysing them. 
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asi (PT, RV, D) = 
let 7 T = PT. stt.t.pp 
C = PT. stt.c 
D' = DuRV{tv) 
in case (PT.lpt, PT.rn) of 

(nil. Op) : let Dp (*1, *2, *3) = PT. stt.t.i 

in if RV{tv) = 0 and Indep{C, D, {*3}) 
then ( 0 , D) 

else ({'k},D' { xi,X2,X3}) 

(niljEq) : let Eq {x,t) = PT. stt.t.i 

in if P’1/(7 t) = 0 ajud UF(C,x,t) ajud Indep{C, D, Pv{t)VJ {x}) 
then ( 0 , D) 

else ({tt}, P' U Pt(f) U {a;}) 

([PTi,PT 2],A) : let (S2,iV2) = a^i {PT2, RV, D') 

( 5 'i,iVi) = asi{Pn,RV,N2) 
in if RV (tt) = 0 and Pi U 52 = 0 
then ( 0 , D) 

else ( Pi U P2 U {tt}, A'"!) 

([PTi,PT 2],V) : let (82, N2) = asi {PT2, RV, D') 

(Pi,iVi) = asi{Pn,RV,D') 
in (SiUS2U{tt},NiUN2) 

(PTi, 3 ) : let Exists (x,Ui) = PT. stt.t.i 
(Si,Ni) = asi(Pn,RV,D') 
in if RV (n) — 0 and Pi = 0 
then ( 0 , D) 

else (Pi U {tt}, A^i — {x}) 

(PTijCall) : let Call(Pfc(j/i, ..., i/„)) = PT. stt.t.i 

(Pi,iVi) = asi{Pn,RV,{xi\-<Indep{C,D',{yi})}) 
in if PV^(7 t) = 0 and Pi U A'"! = 0 and /ndep(C', P, {yi, 
then ( 0 , P) 

else (Pi U{7t},P'U {yi,. ..,?/„}) 

UF{C,x,t) = ye GC.e 3a = mgu{e{x),e(t)) 

Pv(t) = set of program variables occurring in t 
Rv{rt) = set of renaming variables occurring in rt 

indep{c. Pi, P2) = yeec.e {Rv{e{x)) \ xe Di}n {Rv{e{x)) | ® e P2} = 0 



Fig. 4. Slicing property 
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The derivation of a dynamic on the fly analyser from a dynamic analyser a 
posteriori presents similarities with a well-known program transformation within 
the framework of functional programming. The program transformation is called 
deforestation m and its purpose is to eliminate the intermediate data structures 
induced by the composition of recursive functions. Here, the intermediate struc- 
ture is the derivation tree of the natural semantics. We use folding and unfolding 
transformations to carry out deforestation. The three principal operations are 
the following: 



— unfoldings: we set J^a(C, T) = asi{S {C, T)) and replace in the expression the 
calls to the recursive functions agi and S by their definition. 

— applications of laws on the operators of the language (like the conditional 
ones, the expressions case and let ) . 

— foldings which consist in replacing the occurrences of agflS {C ,T')) from 
calls to Va{C , T'). 



The goal of these transformations is to remove all the calls to the property 
extraction function agi, to obtain a closed definition of Va{C^T). The function 
obtained is then a dynamic on the fly analyser since it does not build the inter- 
mediate derivation trees any more. 

The partial correction of the transformation by folding/unfolding is obvious. 
The total correction is not assured in general because some inopportune foldings 
can introduce cases of non-termination. The Improvement Theorem in (‘Ifllj can 
be extented to a method {the extended improved unfold-fold method) presented in 
m which makes it possible to show the total correction of the method proposed 
in this paper. 



Dynamic Slicing Analyser 



The definition of the dynamic slicing analyser for the logic programming lan- 
guage is the following: 

SCd (C, T, RV, D) = asi {S (C, T), RV, D) 

First, we use an unfolding technique applied to the semantics and the property 
functions. We present in HH the transformation rules used for the derivation of 
the dynamic on the fly analyser by unfolding. Figure El presents these unfoldings 
for two rules (the other cases are straightforward). 

To obtain a dynamic on the fly analyser, we must apply folding steps that 
allows us to remove the calls of the function agi- Figure El presents the result 
of these foldings. The fact that SCd itself calls S shows that it is a dynamic 
analysis. 
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SCi (C,T,RV,D) = 
let D' = D U RV (T.pp) 
in case T of 

(tt, Op (®i, *2, *3)) : if -RV^(7 t) = 0 and /ndep(C', D, {xa}) 
then ( 0 , D) 

else ({'k},D' \J {xi,X2,X3}) 

(7T,And (Ui,U2)): let PTi = S {C,Ui) 

Ri = PTi.stt.nf 
{S 2 ,N 2 ) = asi{S{Ri,U 2 ),RV,D') 
( 5 'i,iVi) = asi{S{C,Ui),RV,N2) 
in if RV ( tt ) = 0 and Pi U S2 = 0 
then ( 0 , D) 

else ( Si U S'2 U {tt}, A^i) 

Fig. 5 . Unfoldings of semantics and property functions 



SCd (C,T,RV,D) = 
let D' = D U RV (T.pp) 
in case T of 

{■K,0p{xi,X2,X3)) : if -RU(7 t) = 0 and /ndep(G, -D, {a;3}) 
then ( 0 , D) 

else {{'k},D' \J {xi,X2,Xi}) 

(7T,And (Ui,U2)): let PTi = S (C,Ui) 

Ri = PTi.stt.nf 
(S2,iV2) = S£d{Ri,U 2 ,RV,D') 
(Si,Ni) = SCd{C,Ui,RV,N 2 ) 
in if RV{n) = 0 and Si U S2 = 0 
then (0,P) 

else ( Si U S2 U {tt}, A^i) 



Fig. 6. Dynamic (on the fly) slicing analysis 
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5 Related Work 

The fold/unfold transformation framework used here is based on seminal work 
by Burstall and Darlington EEI- The application of the technique to the deriva- 
tion of programs has also been investigated in 0, which presents the synthesis 
of several sorting algorithms. The initial specification is expressed in terms of 
sets and predicate logic constructs. Our transformations are also reminiscent 
of the deforestation technique EEEE|: in both cases the goal is to transform a 
composition of recursive functions into a single recursive definition. 

Generic frameworks for program analysis have been proposed in the context 
of logic programming languages and data flow analysis EM. They rely 
on abstract interpretations of denotational semantics [TTMm or interpreters m 
and genericity is achieved by parameterising the abstract domains and choos- 
ing appropriate abstract functions. The implementation details of the analysis 
algorithm can be factorised. While these tools may attain a higher degree of 
mechanisation than our framework, they do not offer to the user the same level 
of abstraction: they take as input the specification of an abstract interpreter 
rather than the specification of a property. Despite this difference of point of 
view, all these works are obviously inspired by the same goals. The framework 
introduced in m is closer to the spirit of the work presented in this paper but 
the technique itself is quite different. Programs are represented as models in 
a modal logic and a data flow analysis can be specified as a property in the 
logic. An efficient data flow analyser can be generated by partially evaluating 
a specific model checker with respect to the specifying modal formula. In com- 
parison with this work, our framework trades mechanisation against generality: 
it is not limited to data flow analyses but the derivation process by fold/unfold 
transformations is not fully automatic. 

Few papers have been devoted to the semantics of program slicing so far. A 
relationship between the behaviour of the original program and the behaviour 
of the slice is proved in m- The semantics of the language is expressed in terms 
of program dependence graphs; thus the programs are first analysed in order 
to extract their dependences. This approach is well suited to the treatment of 
imperative languages. Formal definitions and a classification of different notions 
of slicing are provided in m The main distinctions are backward vs forward 
analysers, executable vs non executable slices, and dynamic vs static analysers. 
Their definitions are based on denotational semantics and they focus on the 
specifications of the analyses. In 0 a description of a family of slicing algorithms 
generalising the notions of dynamic and static slice to that of a constrained slice 
is presented. Genericity with respect to the programming language is achieved 
through a translation into an intermediate representation called pim. Programs 
are represented as directed acyclic graphs whose semantics is defined in terms of 
rewriting rules. Slicing is carried out using term graph rewriting with a technique 
for tracing dynamic dependence relations. It should be noted that a richer notion 
of slicing has been proposed for logic programming languages, which returns not 
only the set of program points that must be kept in the slice, but also the 
necessary variables at each program point m- This increased precision can also 
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be expressed in our framework, but we preferred to present the simpler version 
here for the sake of size and readability. By collecting the following information 

( US, + {{n,DUN)},N) 

i 

we can modify straightforwardly each rule in order to get the same precision as 

m- 

6 Conclusion 

We have presented a method to derive dynamic analysers by program transfor- 
mation (folding/unfolding). A dynamic analyser is expressed as composition of 
a semantics and a property functions. The analyser is called a posteriori, it is 
a function computing first a complete program execution trace (derivation tree) 
and then extracting the property of interest. A recursive definition of an analyser 
can be obtained by program transformation. This function is a dynamic on the 
fly analyser that computes the property during program execution. 

We have focussed on dynamic analysis in the body of paper. Our generic 
dynamic analyser is defined in a strongly typed functional languag^l. As a con- 
sequence, we can rely on previous results on logical relations and abstract inter- 
pretation m in order to systematically construct static analysers from the dy- 
namic analysers. The first task is to provide abstract domains for the static slicing 
analyser and the corresponding abstraction functions. We recall that the type of 
the dynamic analyser is C X T X (PP ^ T^(Pvar)) X T^(Pvar) ^ 7^(PP) x T^(Pvar). 
Since PP ^ T^(Pvar), T^(Pvar) and 7^(PP) are already abstract domains associ- 
ated with the dynamic analysis, only C needs to be abstractecfl The next stage 
to derive a correct static analyser is to find appropriate abstractions for the 
constants and operators occurring in the definition of the analyser. It is shown 
in P] that the correctness of the abstract interpretation of the constants and 
operators of the language entails the correctness of the abstract interpretation 
of the whole language. The correctness of the abstract interpretation means that 
the results of the dynamic analysis and the static analysis are related if their 
arguments are. In fact, it is possible to define the most precise abstraction for 
each constant and operator of the language PJ. The basic idea to find the best 
abstraction . . . ,u“) of an operator op is to define it as the least upper 

bound of the abstractions of all the results of op applied to arguments Vi belong- 
ing to the concretisation sets of the arguments of the uf. The technique sketched 
here provides a systematic way to construct a correct abstract interpretation, 
and thus to derive a static analyser from a dynamic analyser [Illlijj . By deriving 
static analysers as abstractions of dynamic analysers, we can see the dynamic 

® Note that the typing mentioned here has nothing to do with the language in which 
the analysed programs are written, this language itself can perfectly well be untyped. 

® Of course, as usual in abstract interpretation, PP ^ "P(Pvar), 'P(Pvar) and "P(PP) 
can also be abstracted if further approximations are needed, but we do not consider 
this issue here. 
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analyser either as an intermediate stage in the derivation of a static analyser 
(playing a role similar to a collecting semantics) or as the final product of the 
derivation. 

The theory of abstract interpretation 0 provides a strong formal basis for 
static program analysis. The work described here does not provide an alternative 
formal underpinning for program analysis. Its goal is rather to put forward a 
derivation approach for the design of analysers from high level specifications. 

Our framework is applicable to a wide variety of languages, properties and 
type of service (dynamic or static). We have proposed in the body of the pa- 
per a formal definition of a dynamic slicing analyser for a logic programming 
language. To our knowledge, this definition is the first one to be formal, so the 
benefit of our approach is striking in this case. In m, we present the derivation 
of dynamic and static analysers for a strictness analysis of a higher-order func- 
tional language and a live variable analysis for an imperative language. We have 
also applied this work for a globalisation analysis of a higher-order functional 
language and a generic sharing analysis. Pushing our approach ever further we 
arrive at a natural semantics format and a format for slicing, as presented in m 
We have shown the correctness of the slicing property format. These formats can 
be instantiated for several programming languages (imperative language, logic 
programming language and functional language). The slicing property for the 
logic programming that we have presented here is an instantiation of the slicing 
format. 

As mentioned in the introduction, we wanted to establish the connection 
between the result of the analysis and its intended use. Analyses are generally 
performed to check assumptions about the behaviour of the program at specific 
points of its execution or to enable program optimisations. In both cases the 
intention of the analysis can be expressed in terms of a transformation and a 
relation as presented in j 1 1 II r.^) . The transformation depends on the result of the 
analysis and the relation establishes a correspondence between the semantics of 
the original program and the transformed program. For example, in the case of 
a program analysis for compiler optimisation the transformation expresses the 
optimisation that is allowed by the information provided by the analysis and 
the relation is the equality between the final results (or outputs) of the original 
and the transformed program. It is not always the case that the relation is the 
equality: a counter-example is slicing analysis described in this paper (because 
the new program is required to behave like the original one only with respect 
to specific program points and variables) . We have formally defined and proved 
in m a property for the intention of a slicing analysis but space considerations 
prevent us from presenting the intentional property for slicing. 

There is a main aspect in which the work described here may seem limited: 
we have used only natural semantics and terminating programs. Structural Op- 
erational Semantics (SOS) are more precise than natural semantics and they 
are required for a proper treatment of non-determinism, non-termination and 
parallelism mi. In fact, the natural semantics introduced in section 2 can be 
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replaced by SOS without difficultj0 and the dynamic analyses can be defined 
in the very same way. The extra difficulty introduced by SOS is the fact that 
they create new program fragments which makes it necessary to abstract over 
the syntax of the language to derive a static analyser. This problem is discussed 
in m- We can also adapt our natural semantics to SOS by using the technique 
presented in m- To achieve this goal, the classical inductive interpretation of 
natural semantics has to be extended with coinduction mechanisms and rules 
must be defined to express divergence. 
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Abstract. We present a finite symbolic semantics of value-passing con- 
current processes, that can be suitably interpreted over abstract values 
to compute a lower approximate semantics of full /r-calculus. The main 
feature of the semantics is that classical branching is replaced by ex- 
plicit relations of non-deterministic and alternative choices among tran- 
sitions. A combination of safe upper and lower approximations of the 
basic operators of the logic is used to handle negation. The relations of 
non-deterministic and alternative choices turn out to be very useful for 
the dual approximations of the existential next modality. 

Key words: Model checking, /r-calculus, abstract interpretation. 



1 Introduction 



Model Checking is a very successful technique for the automatic verification of 
temporal properties of reactive and concurrent systems, but it is only applicable 
to finite-state systems. Over the past few years, abstract interpretation has been 
widely applied to handle large as well as infinite systems with model checking 
[[tlllHII III ;fl4l!Sir/l I im bj . Abstract interpretation m was originally conceived 
in the framework of data-flow analysis for designing approximate semantics of 
programs and relies on the idea of obtaining an approximate semantics from the 
standard one by substituting the concrete domain of computation and its basic 
operations with an abstract domain and corresponding abstract operations. The 
typical approach consists of constructing an abstract model over a chosen set 
of abstract states that can be used in model checking instead of the concrete 
one. To this aim the abstract model has to be safe, namely the formulas sat- 
isfied by the abstract model have to hold in the concrete one. For branching 
time logics the definition of a safe abstract transition relation among abstract 
states presents some basic difficulties and a single abstract transition relation 
cannot preserve both the existential and the universal next modality. Several 
authors ITTlTUra propose to adopt two different abstract transition relations: a 
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free transition relation for computing the universal next modality, and a con- 
strained transition relation for computing the existential next modality. Safeness 
of the free transition relation is ensured, if every concrete transition induces 
a free transition among the corresponding abstract states. In contrast, a con- 
strained transition between abstract states is safe only if all the corresponding 
concrete transitions exist. Given this notion of safeness it turns out to be very 
difficult to effectively compute a sufficiently precise safe abstract model without 
constructing the concrete one. 

In this paper we propose a method for applying abstract interpretation to 
the ^-calculus model checking of value-passing concurrent processes. The main 
contribution is the definition of a symbolic semantics of processes in the style of 
whose main feature is that explicit relations of non-determinism and alter- 
native choice among transitions replace classical branching. Moreover, a finite 
graph for regular processes is achieved by avoiding the infinite paths of m due 
to parameterized recursion. Model checking of ^.-calculus can be suitably per- 
formed by interpreting the obtained symbolic graph over concrete environments 
assigning concrete values to variables. However, since processes are capable of 
exchanging values taken from a typically infinite set, the fixpoint computation 
of /r-calculus semantics is not effectively computable. We define a technique to 
compute a lower approximation of the /i-calculus semantics by interpreting the 
symbolic graph over abstract environments on a given (finite) set of abstract 
values. Safeness of the lower approximation ensures indeed the preservation of 
any property. Following the lower approximation is achieved by combin- 
ing dual safe upper and lower abstract functions corresponding to all logical 
connectives except negation. The critical case is undoubtedly that of the next 
modality, where safe constrained and free transition relations among abstract 
processes have to be considered. We show that explicit non-deterministic and 
alternative choices between transitions allow us to avoid some typical problems 
due to abstract branching so that a more precise lower approximation of the 
next modality in particular is achieved with respect to previous proposals cni. 
Finally, we discuss the basic problems that typically lead to miss optimality in 
the approximations of the next modalities. 

The paper is organized as follows. Section|3 presents value-passing concurrent 
processes, /x-calculus and concrete model checking. Section |3 summarizes the 
basic concepts of abstract interpretation. The symbolic graph is described in 
Sect. 0 and the corresponding model checking algorithm is shown in Sect. 0 
Section 0 presents abstract model checking and Sect. 0 discuss optimality of 
abstract model checking. 



2 Concrete Model Checking 

We consider a value-passing version of CCS. Let Val a, set of values (possibly infi- 
nite), Chan a set of channels and Var a set of variables. Moreover, let Bexp and 
Vexp be sets of boolean and values expressions. Processes Proc are generated 
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by the following grammar 

p ::= nil\x\ a.p \ bey pi,p 2 \ pi x p 2 \ pi + P 2 \ p\ L \ P(ei, . . . ,e„) 

where a G {c!e, c?a;,T | c G Chan,e G Vexp,x G Var}, L C Chan, and 
be G Bexp is a boolean expression and G V exp are expressions over val- 
ues. Process clx.p will receive a value v on the channel c and then behaves 
as p[v/x], where p[v/x] denotes the standard substitution of v for all free oc- 
currences of X. Process c\e.p will send the value of the expression e and then 
behaves as p. The operator -|- represents choice, while x represents parallel com- 
position. Process be y Pi,P 2 behaves as pi if the value of be is true, and as 
P 2 otherwise. The operator \L is the standard restriction for a set of channels 
L. Finally, P{xi, . . . ,Xn) is a process constant, which has an associated defini- 
tion P{x \, . . . , Xn) = P- We assume the usual definitions of free variables fv{p) 
and bound variables bv{p) of processes. A process p is closed iff fv{p) = 0. In 
the following, we denote by capital letters T,P . . . open processes. For recursive 
processes P{x\, . . . ,Xn) = p we assume fv{p) C {xi, . . . ,Xn} and we assume 
the body to be guarded. The concrete semantics of processes is defined in Ta- 
ble Q of the appendix as a labelled transition system LTS{p) = (P*, A-) with 
actions a G Act = {t,c7v,cIv \ c G Chan,v G Val}. Two semantic functions 
Sv : Vexp Val and Sb : Bexp -G {tt,ff} for the evaluation of expressions 
are used. For a G Act, we define chan{T) = 0, chan{clv) = chan(clv) = {c}. 
Moreover, we denote by a the symmetric action of a, namely dv for civ and civ 
for d.v. Note that, if the set of values is infinite, the labelled transition system 
is infinite and infinitely branching. 

For expressing temporal properties of processes we consider a simple exten- 
sion of propositional /i-calculus H3 Let Act be a set of actions and VAR be a 
set of logical variables. Formulas are inductively defined as follows. 



A::=X\AhA\<K> A\^A\ pX.A 



where X G VAR is a logical variable and K G {t, clV, d.V \ c G Chan, V C Val}. 
We assume that each variable X occurs positively in formulas. The modality 
< K > subsumes the classical existential next modality < a > ranging over 
actions a G Act and corresponds to VaeK < a >. The dual universal modality is 
\K] = -1 < AT > - 1 . The operator pX.A denotes the least fixpoint and the dual 
operator of greatest fixpoint is equivalent to vX.A = ^pX.A[-^X/ X], Note that 
formulas with K infinite are needed for subsuming the logics CTL and CTL* , 
since Act can be infinite. For instance, the classical liveness property 'iP A can 
be expressed as pX.A V [t]X \J ^ \J al]X . 

Traditional global model checking corresponds to compute the semantics of 
the formula || A || on the concrete labelled transition system LTS{p) = {P*, A-). 
Let 6 : VAR — >■ V{P*) be a valuation assigning subsets of P* to logical variables. 
The semantics of an open formula A with respect to 5 is defined as: 



X |U=<5(X) 

Ao A Ai ||5=|1 Aq ||i n II A\ ||i 
< K > A ||j= UaGK ||< a >11 (II A 11^) 



“■A ||i= P\ II A Hi 
pX.A \\s=^iV.{\\A\\slv/x]) 
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where stands for the valuation 6 ' , which agrees with 6 except that 6 '{V) = 

5 (X). The next modality function ||< a >||: V{P*) — >■ V{P*) is given by ||< a >|| 
{S) = {p€P\ 3 p^p',p' €S}. 

Therefore, p g|| A || (p |= A) iff p g|| A ||5„, where 6 o is the empty evaluation. 

3 Abstract Interpretation Theory 

In this section we briefly recall the basic ideas of abstract interpretation, we 
refer the reader to PP for more details. The theory of abstract interpretation 
provides a systematic method to design approximate semantics of programs by 
replacing the concrete domain of computation with a simpler abstract domain. 
The relation between the concrete and the abstract domain is precisely stated 
into a formal framework. 

Let (C, <) and (A, <#) be two posets, where orderings < and correspond 
to precision. A pair of functions (0,7), where a : C — > A (abstraction) and 7 : 
A — >■ C (concretization) is called a Galois connection iff Vc G C,\/a G A, a{c) 
a c < 7(c). These requirements can also be captured by saying that a is 
extensive (c < 7(0(0))), 7 is reductive (0(7(0)) a), a and 7 are total and 

monotonic. If 0(7(0)) = o, then (0,7) is called a Galois insertion. 

Intuitively, the condition c < 7(0(0)) ensures the loss of information of ab- 
straction to be safe. On the other hand, condition 0(7(0)) o ensures that 
the concretization process introduce no loss of information. Let S{P) be the 
semantics of a program P computed as the least fixpoint of a semantic function 
F over the concrete domain (C, <). The goal is that of computing an approx- 
imate semantics 5 ^(P) over (A, <#), that is safe a{S{P)) 5 ^(P). The 

main result is that a safe approximate semantics S"^{P) can be computed as 
the least fixpoint of a safe approximate semantic function over (A, <"^), 
such that a{F{c)) F"^(a(c)), for each c G C. Moreover, it has been shown 

there there exists always a best approximate semantic function {optimal), 
when a{F{c)) = FH{a{c)). In the Galois insertion case this is equivalent to 
a{F{j{a)) = F^l{a), for each a G A. 

4 The Symbolic Graph 

In this section we define the symbolic graph of processes. Symbolic semantics as 
introduced in P 2 I relies on the idea of using symbolic actions instead of concrete 
actions. For instance, transitions modeling input are represented by a single 

lC 

transition clx.p i-4 p. To handle conditional open processes symbolic actions 

depend in addition on boolean guards. Thus, a symbolic transition T T' 
represents all concrete transitions with action a corresponding to symbolic action 
9 for the assignments to the free variables of T such that guard c is satisfied. Our 
definition of the symbolic graph differs in some aspects from the classical one. 
First, classical branching is replaced by explicit relations of non-deterministic 
(©) and alternative choices (©) among transitions. Moreover, a method to avoid 
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infinite applications of the standard rule for recursion is proposed. This way a 
finitely branching and finite graph for regular processes is obtained. 

We use the standard notions of substitutions and environments. A substi- 
tution is a partial function a : Var ^ VExp and a simple substitution is an 
injective a :Var ^ Var. We denote by Sub the set of simple substitutions. For 
a substitution a we denote by tar{u) and dom{a) its target and source, respec- 
tively. An environment is a total function p : Var ^ Val and Env is the set of 
environments. For an environment p we denote by p[x — > u] the environment that 
agrees with p except that the value assigned to variable x is v. Let T be an open 
process and x G Var. We say that x is fresh in T iff a: ^ fv{T)Ubv{T). Moreover, 
we say that a term T is free for a simple substitution a with dom(a) C fv(T), 
iff for each x G fv{tar{a)), x ^ bv{T) U {fv{T) \ tar{a)). Let T be an open 
process and p an environment. We denote by {T,p) the closed process T p, that 
is obtained by substituting p{x) to variable x for each x G fv(T). 

Let C be the set of constraints with c::=&e|e = e|eGM| true \ 
-■c I c A c, where e G Vexp, be G Bexp and V C Val. We denote by vars(c) 
the variables occurring in constraint c. For c G C we consider the semantic 
function || c ||: V{Env) — ?> V{Env) obtained in the trivial way by || be || = 
{p G Env I Sb{bep) = tt}, || ei = ||= {p G Env \ 5„(eip) = 5„(e2p)} and 

\\ e G V 11= {p G Env \ Sy{ep) G V}. We use the notation p \= dor p g|| c ||. 

Let the symbolic actions be SymAct = {c?a;,c!e,T | c G Chan,x G Var,e G 
Vexp} with bv(c?x) = {a;}, bv(c!e) = bv(r) = 0 and fv{cle) = vars(e), fv^r) = 
fv{c!x) = 0. 

Transitions of the symbolic semantics are T — A such that 

for each i G {1, n} (f G {1, . . . , n}) 

1. 0i = (ci, jJ with 0ij. G SymAct U {*} and Ci G C; 

2. [2i = where Ti j. are open processes; 

3. vars{ci) C /u(T), fv{9ij,) C fv(T) and fv(TijJ C fv(T) \Jbv{9ijf). 

All possible behaviors of closed processes obtained from T are represented by 

• • • . , 

a single transition T — A where alternative choices are related 

by © and non-deterministic choices by ©. The idea is that for each environment 
p there exists a unique alternative Ci that is satisfied and symbolic actions 9ij. 
with corresponding processes represent the concrete transitions of (T, p) . 

Let us explain informally the construction of transitions. The complete se- 
mantic rules are shown in Table 0 of the appendix. 

Transitions of a basic process a.T are obtained by a rule a.T ^ T © 

a.T, where the special action * denotes idle action and is used in the parallel 
composition rule. 

The non-deterministic choice of + is reflected by the composition of transi- 
tions with ©. The rule for choice is as follows 



{Ti 









■ A 



T 1 +T 2 
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where 0 ^^ is guarded by constraint Cij^ Ac2j2 ^nd where all non-deterministic 
choices of 0 ij. = for i G {1,2} are merged by ©. The 

resulting processes are analogously combined by In fact, for each envi- 

ronment that satisfies both guards both actions of T} and of T2 can be chosen. 



Example 1 . For instance, a process T 



a\x.T2 



{true,c\x^a\x^*) 



Ti © T, © T. 



dx.Ti + alx.T2 is modeled by dx.Ti + 



The alternative choices of a conditional process are related by © through the 
following rule 



m 









6e vTi,T 2 









V 



where Oiji is equivalent to 6*1 jy where the guard is additionally constrained by 
be, while 02 , j2 equivalent to 02, j2 where the guard is additionally constrained 
by -'be. Intuitively, transitions of be\/ T\, T2 are either transitions of T\, if he is 
satisfied, or transitions of T2, if be is not satisfied. 



Example 2 . For T = x > o\/ dx.Ti + alx.T2, blx.nil we have T ^ P 2 \ © 172, 
where 0 \ = {x > 0,c!x © a\x © *), 02 = {x < Q,b\x © *), = Ti © T2 © T 

and Q2 = nil © T. In fact, for each environment p either a: > 0 or a; < 0 is true 
and the process {T, p) is able to perform either both d.p{x) and a!p(a:) or b!p(x), 
respectively. 



The rule of parallel composition is quite complex. Since a single transition 
must represent all behaviors combined by the relations © and ©, a single rule 
performs at the same time synchronization and interleaving. The rule is defined 
as follows 



{Ti 






ie{i,ni}0i,n}ie{i,2} 



Ti X ©2 






5ie{l,2},JiG{l,ni} 



17 ■ 



where is constrained by cij^ A C2j2 and its actions are all possible com- 

binations of actions bi,ji,hi and 02,j2,h2 for 6**© = (©,©> for 
i G {1,2}. The combination of processes corresponding to the combination of 
actions is realized by I2/u2- 



Example 3 . Consider an open process T1XT2 with Ti = x > Q\/clx.T,alx.T and 
©2 = d.y+l.T + a\y - l.T. We have ©i ^ j, ^ 

and ©2 ) © © © © ©2. The transition resulting from parallel 

composition is ©1 x ©2 17^ ^ where 6>i = (x > 0, r © dlz © c!y + 1 © 

a\y — 1 © *), 02 = {x < 0, T© a?z© c!y + 1 © a\y — 1 © *), 17i =T[y+l/x] x ©© 
T[z/x] X ©2 ©©1 X ©©©1 X ©©©1 X ©2 and C2 = T[y— 1/x] x ©©©[2/x] x ©2 © 
©1 X © © ©1 X ©© ©1 X ©2. Two alternative choices corresponding to constraints 
X > 0 and x < 0 arise from composition of guard x > 0 with true and of guard 
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j: < 0 with true, respectively. For guard x > 0 the non-deterministic choices are 
obtained by the composition of actions clx and * with c\y+l, a\y—l and *, while 
for guard x < 0 they are obtained by the composition of actions alx and * with 
c\y+ 1, a\y— 1 and *. Therefore, t actions are obtained from the synchronization 
of actions clx and c\y + 1 and of alx with a\y — 1, respectively. The others 
arise from interleaving by composition with *. The corresponding processes are 
obtained in the obvious way. For instance, the process corresponding to r with 
guard X > 0 is T[y+ 1/x] x T, since the value of y— 1 is received by Ti. Note that 
in the interleaving case with a receive action the variable x must be renamed to 
z to avoid clash of variables with the free variables of T2 . 

Recursive processes are handled by the classical rule, where formal parame- 
ters are substituted by actual parameters. 

T[e/x] ^ 

— rec P{x) = T 
P(e) 

Unfortunately, the application of rule rec leads to an infinite graph also for 
regular processes, where there is no parallel composition inside the scope of 
recursion. The semantics of IE! suffers of the same problem. 

Example 4- Let us consider the process P(x) = c\x.(P{x+ 1) -I- P{x — 1)). Since 
the recursive process is unfolded infinitely times with a different argument an 
infinite graph arise. 

P{x) P{x + 1) + P{x - 1) © P(x) 

P{x + 1) + P{x — 1) 1 _|_ 1 _|_ 1) _|_ p(^x + 1 — 1) © P{x + 1) 

P{x + 1) + P{x — 1) 1 _j_ X) p _ 2^) 0 p(^x — 1) 

This problem can be solved by replacing in the graph transitions of P{e) by 
transitions of a process P{x) for fresh variables x. The semantics of P{e) can 
naturally be obtained by the semantics of P{x) by instantiating parameters x 
to the actual values corresponding to the evaluation of the expressions e. 

Let general processes QV have the following syntax 

GP ::= nil \ a.T \ P{x) \ GPi + GP 2 \ be V GRi, GR2 I GPi x GP2 I GP \ L 

where x is a tuple of distinct variables and T G Proc is a process. 

Note that a.T is a general process for any process T, since there are no 
current recursive calls. For general processes GP € QP, the recursion variables 
rv{GP) are defined as rv{a.T) = rv{nil) = 0, rv{P{x)) = {x}, ru(GPi + GP2) = 
rv{GPi X GP2) = rv{be V GPi, GP2) = rv(GPi) U rv{GP 2 ) and rv{GP \ L) = 
rv{GP). 

Our aim is that of finding a general process GP that can be used instead of 
T in the graph. For this purpose we define most general terms. 
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Definition 5. Let T be a process. We define most general terms n{T): 

— if T = nil or T = a.Ti, n{T) = {T); 

— for GPi G n{Ti) such that, rv{GP\) C\fv{GP 2 ) = 0 , rv{GP\) C\hv{GP 2 ) = 0 
and vice-versa, then GPi + GP 2 € IT(Ti + T 2 ), GPi x GP 2 € II [Ti x T 2 ) 
and be V GP\,GP 2 G 77(6e V Ti,T 2 ); 

— if P{z) = T and T is free for [ai/z], then P{x) G 7T(P(e)). 

Most general terms are general processes obtained by introducing fresh and 
distinct variables in current recursive calls. The following property is satisfied. 

Proposition 6. Let T be a process. For each GP G LI{T), there exists a sub- 
stitution a with dom{u) = rv{GP) such that GPa = T. 

The substitution cr assigns actual parameters of T to formal parameters of GP. 
For p G Env, let genT,ap ■ Env — >■ Env, such that p{x) = genT,Gp{p){x), for 
X ^ rv{GP), and p{x) = Sv{a{x)p), otherwise. In the environment gencp.rip) 
formal parameters of GP are instantiated to the values of the actual parameters 
provided by cr. The result is that the closed process {GP, genT,Gp{p)) is equiv- 
alent (bisimilar) to {T,p). For pi,p 2 G Proc be processes, we say that pi = p 2 
iff for each a G Act, for each pi p^ there exists p 2 — ^ p '2 and p[ = p '2 and 
vice-versa. 

Proposition 7. Let T be a process and p G Env. For each GP G n{T), 
{GP,genT,Gp{p)) = {T,p). 

By proposition 2 the symbolic graph where transitions of T are replaced by 
transitions of GP correctly models the behavior of processes. 

Definition 8. LetT be a process. We define SQ{T) = {GP* ,T* , ) with 

GP* C QP, T* C Proc and transitions are GP ~ for each 

GP G GP*, where 

1. for each T' G T* there exists GP' G GP* fl II {T'); 

2. T G T* and for each GP — A where fli = 

Tjj, G T*, fori G {l,n}, ji G {l,nj. 



Example 9. Consider for instance the process P(l), where P{x) is defined in 
example 4. Since P{z) -\- P{w) G II {P{x -\- 1) -\- P{x — 1)) then 5t/(P(l)) is finite: 

P{x) p(^ + 1 ) + - 1) ® P{x) 

P{z) P{w) -b 1) -f P{z — 1) © P{w + 1) + P{w — 1) © P{z) + P{w) 

This graph correctly describes the behavior of P(l). For instance, the concrete 
computation P(l) f© P(1 + 1) + P(1 — 1) P(1 + 1 + 1) + P(1 + 1 — 1) . . . 

is simulated by {P{x),pi) {P{z) + P{w),p 2 ) . . -, where pi{x) = 1 and 

P 2 {z) = 2, P 2 {w) = 0. Environment p 2 = genT,Gp{pi) assigns to parameters z 
and w the result of the evaluation of expressions a: + 1 and a; — 1 with respect 
to pi. Note that in P{z) + P{w) distinct fresh variables are used to model all 
recursive calls P(ei) + P{c 2 ), where e\ may be different from 62 . 
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The equivalence is formally stated by the following theorem. Let K € {r, c\V, 
clV} and 9 G SymAct. We define the constraint 9GKGCas9GK = true, 
for K = 9 = T, 9 G K = true, for 9 = clx and K = clV , 9 G K = e gV, ior 
9 = c!e and K = c!V, and 9 G K = false, otherwise. For K = a we denote by 
a = 9 the constraint 9 G a. 



^ 1 , Ti } 

Theorem 10. 1. For each symbolic transition GP — A w/iere 

Gi = cind fli = and for each p G Env, 

there exists one and only one i G {l,n}, such that p \= Ci and, for each 

C? U 

ji G {l,ni}, there exists a G Act with p \= a = 9ij. such that, {GP,p) i— >■ 
{Tij.,p[x — >■ w]) for 9ij. = clx and v G Val, and (T,p) A- (Tij.,p), for 
9i^j. G {r,c!e}; 

2. for eachp G P* there exists GP G GP* and p G Env such that GPp = p and, 

for each p A p' , there exist i G {l,n} and ji G {l,ni}, where GP 

wtth Gi — and f2i — and p ^ 

Ci A a = 9ij. and p' = {Tij^,p[x -A u]), for a = civ and 9ij^ = clx, and 
P' = for a G {t,c\v}. 



By theorerndthe concrete semantics is safely represented by the symbolic one 
and in addition concrete non-deterministic and alternative choices are exactly 
composed by © and © in symbolic transitions. Since transitions are restricted 
to general processes, infinite applications of rec with different arguments are 
avoided and the symbolic graph is finite for regular processes. 



Theorem 11. Let p G Proc be a regular process. SQ{p) is a finite graph up to 
renaming. 



5 Symbolic Model Checking 

In this section we show that model checking can be realized by interpreting the 
symbolic graph over concrete environments. Let (fP{D),Q), with D = {(T,p) \ 

T G GP* and p G Env} for SQ{p) = {GP* ,T* The semantics of 
a formula A is defined as in Sect. 0 by replacing evaluations with symbolic 
evaluations 5 : VAR — >■ P{D) and by taking the following function for the next 
modality. For K G {t,cIV,c1V} and S G V{D), 

- ||< K >11® {S) = {{T,p) I T ®iG{i.n}A, with = (Ci,®j,e{i,ni} 

9i^jl) and fli = ©jiGli.njAji, and there exist i G {l,n} and ji G {l,ni}, 
such that p ^ Ci A 9ij. G K and {Tij,p) G= S, for 9ij. G {t, c!e}, and 
(Tij.,p[x -A c]) G= S, for some u G F for Lf = clV and 9ij. = clx}. 

where {T,p) G= S if {GP, genT,Gp{p)) € S for some GP G n{T). 

The definition of ||< a >||® is based on the observation that process {T,p) 
is able to perform an action a, if the environment satisfies a guard Ci for which 
there exists a symbolic action 9ij^ corresponding to a. Moreover, the resulting 
process must be in S. However, since S' is a set of general processes and Tij. is 
not a general process we look for an equivalent process {GP, genxi ,Gp{p)) G S. 
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Theorem 12. For each closed formula A, || A ||‘^ (IP* =|| A ||. 

6 Abstract Model Checking 

In this section we define the approximate semantics of the collecting semantics 
II A 1 1 "® on the abstract domain obtained by replacing concrete environments 
with abstract environments on an abstract values domain. 

Let be a Galois insertion between the complete lattices {P{Val),C) 

and {V{VaV^), C) of concrete and abstract values. We consider the set of abstract 
environments Env"^ = | Var ^ Vah^} and the set of abstract processes 

D* = {{T,p*) I T e GP* and p* S Env*} for Sg{p) = (GP*, T*, 

Let 7e : V{Env'^) —?■ V{Env) and 7 : P{D) be the obvious functions 

induced by 7„. 

Our purpose is that of computing a lower approximation of the semantics 
II A 11 ^, such that 7(|| A ||*) C|| A H"®. Given the relation between concrete and 
abstract domain an approximate semantics can be obtained by replacing concrete 
functions with corresponding safe abstract functions. However, the operator of 
negation is not monotonic. Kelb dS! argues that a lower approximation of the 
full logic with negation can be obtained by combining duals approximations for 
formulas without negation || A ||* and || A ||“. In fact, || ^A ||*= || A ||“ is 

a safe lower approximation of || ->A ||‘®, since || A ||‘^C 7(|| A ||“). Analogously, 
II ->A ||“= II A Ip is a safe upper approximation of || ~<A ||'^. Thus, the 

problem is reduced to the definition of safe dual approximations for all the 
logical operators except negation. The safeness of the dual approximations is 
formally defined with respect to the following framework. 

Proposition 13. There exist : V{D) such that (a“,7) is a 

Galois insertion between {P{D),C) and (P(L?^),C) and {a\^) is a Galois in- 
sertion between {P{D),A) and (P(H^), 2 )- 

We show that both the upper and the lower approximation can be computed 
over the symbolic graph with respect to abstract environments. We exploit the 
following constraints based on relations © and ©. The symbol V denotes with 
an abuse of notation the equivalent constraint. 

Definition 14. Let T — A with Oi = o.'nd 

be a symbolic transition. For each K € {t,c\V,c 7 V} and 
i € {l,n}, ji G and I = with li C {l,ni} we define 

1. FREEx^iji =Ci A 8,^j. G K; 

2. CONk.I = AiG{l,n} V (VjiG/i ^ ' 

The abstract semantics is based on safe lower and upper approximations of 
the previous constraints and on dual approximations of the relation G=. Note 
that for each c G C, || c ||“ and || c ||* are safe approximations iff || c ||C 7g(|| c ||“) 
and 7e(|| c ||*) C|| c ||, respectively. Moreover, gL is a safe upper approximation. 
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if p G 7e(p^) such that (T, p) G= 7(5'^) implies {T,p^) while G^ 

is a safe lower approximation if {T,p^) gL only if, for all p G 7 e(p^), 
(T,p) G^7(^^)- 

Let : VAR — ?> V{D'^) be an abstract evaluation. We define the abstract 
upper and lower semantics of an open formula A with respect to 5"^ as follows. 

Lower Approximation 



II ^0 A Ai ||^# = |1 Ao 11^# n II Ai 11^# 
||<iG> 7 l||*#=||<A>||' (II All*#) 

where, for G V{D"^), 



fJ-X.A 11^# pL. (II A ||^#j^^^j) 
-A ||'#=0#\(|| A||“#) 



- ||< K >11* (5#) = {(T,p#) ®ie{i,n}f?.,with0, = 

OiJ^) and Qi = there exists I = with li C 

{1, rii}, such that I is minimal and p^ G|| CONkj ||*, and, for each i G {1, n} 
and ji G li, gL S*, for G {c!e,r}, and {Tij.,p*[x 

a„(u)]) G^ 5^, for v G V, K = c?V and = c7x}. 



Upper Approximation 



||X||“#=5#(X) 

II AoAAi ||“# = || Ao ||“# n || Ai ||“# 
||<7 G>A||“#=||<A>r (II A||“#) 

where, for G V{D"^), 



pX.A ||“#=pU(|| A||^#,^/^,) 
-A ||^#=U#\(|| All*#) 



- ||< K >r (5#) = {(T,p*) I T withe, = (c„ 

9iji) and = ©jie{i,ni}T’ii,p 5 and there exist i G {l,n} and ji G {l,rii} 
such that p^ g|| FREEK,i,ji ||“ and (Tij, ,p#) gL S'#, for Oij. G {t, c!e}, 
and {Tij.,p"^[x — >■ a.u(^')]) G# S#, for u G F, A = c7V and 6ij. = clx }. 



Let us explain the dual approximations of the next modality. 

1. The upper approximation is safe, if (T, p) g||< a >||'®^ (y(S#)) implies 
{TjP"^) g||< a >11“ (S#) for p G "fe{p^)- In other words, safeness requires 
to consider at least the abstract transitions corresponding to the concrete 
a-transitions {free transition relation 1 1 1>I4] L If constraint FREEa^ij^ = 
Ci A 6i j^ = a is safely upper approximate the previous condition is guar- 
anteed, since p ^ A 9ij. = a implies p# g|| A 9ij^ = a ||“. 

2. The definition of the lower approximation is quite complex. In this case 
safeness is ensured, only if {Tap'll') g||< a >||* (-S'#) implies (T, p) g||< 
a >11'® ( 7 ( 5 '#)), for each p G 7 e(p#). In other words, safeness requires to 
consider only the abstract transitions, for which all corresponding concrete 
a-transition exist {constrained transition relation [1114] ). Let us consider the 



A Symbolic Semantics for Abstract Model Checking 145 



constraint CONaj = ^ ^ and its lower ap- 

proximation. If s|| CONaj lA then for each p S 7 e(p^), if p ^ there 
exists an action 0^^. with ji C corresponding to a. By theorem Q for each 
p there exists exactly one Cj such that p ^ Cj so that, for each p there exists 
indeed a transition with action a. 

Note that by definition of the next modality safeness for both approximations 
requires in addition safeness of the dual approximations of G= to check the 
resulting processes to be in S'#. 

Example 15. Consider a process T = x > 0 clx.Ti, a?x.T 2 , whose behavior is 

described by T Ti©T(g)T 2 ©T. Let the values abstraction 

be av{n) = • and let p# be the abstract environment with p#(cc) = •. For a 
formula A =< cln > true we obtain an upper approximation {(T, p#)} = 
II A ||“. In fact, there exists p G 7e(p^) such that p |= a; > 0 A (c?x = c?n) = 
X > 0 A true so that p# g|| x > 0 A true ||“. Analogously, {(T, p#)} =||< 
aln> true ||“. The abstract operator is safe, since the existence of both concrete 

transitions (T,pi) (Ti[n/x],pi) and (T,p 2 ) ^ (T 2 [n/x], P 2 ) for pi,p 2 G 
7 e(p#) is captured. Note that in the abstract process non-determinism among 
the two actions c?n and a?n arise even if in the concrete case these are two 
alternative choices. 

In contrast, let us consider the lower approximation for formula A. Since there 

exists pi,p 2 G 7 e(p#) such that (T, pi) ^ and (T, P 2 ) ^ the lower approxima- 
tion is safe if and only if (T,p'^) ^\\ A ||*. Since || x > 0 |p= 0, |l ^ ^ 0 |p= 0 
and c7n = alx = false, p# ^|| (x > 0 V true) A (x < 0 V false) |p. Therefore, 
we have both ||< cln > true |p= 0 and ||< aln > true |p= 0. The abstract op- 
erator is able to observe that there is no real non-determinism between aln and 
cln, while this is an alternative choice as expressed by © and there are concrete 
processes for both alternatives. 

On the other hand, consider the values abstraction with a„(n) = Pos, if 
n > 0, and Uy{n) = Neg, otherwise. There are two abstract environments p^f , 
with pf{x) = Pos, and pf , with pf{x) = Neg. Since for each p G 'j{pf), 
p G|| c II for c = (x > 0 D true) A (x < 0 D false), then pf g|| c ||* is safe. With 
this evaluation of constraint we can safely obtain ||< cln > true ||*= {(T, pf)}. 
Since, for each concrete environment of pf only the alternative x > 0 is possible, 
it is safe to conclude that all concrete processes indeed perform cln. 

Lemma 16. Let || c ||* and || c ||“ be safe lower and upper approximations for 
c G C. Moreover, let Gf and G# be safe lower and upper approximations of G=. 
For each S* G V{D*) and K G {r, c?F, cl^j, a“(||< K >||'^ (7(5'#))) C||< 
K >11“ (5#) and ||< K >||' (5#) C «'(!!< K )(y(5#)). 

By lemma 1 the lower approximation of the full logic is safe. 

Theorem 17. Let || c f and || c ||“ be safe lower and upper approximations for 
cGC. Moreover, let g5 and G# be safe lower and upper approximations of G=. 
For each closed p-calculus formula A, || A ||‘®© 7 (|| A ||*). 



146 



F. Levi 



7 About Optimality 

We have defined a method for constructing safe dual approximations of the 
next modality, that exploit dual approximations of constraints and of relation 
S=. It this setting precision of abstract model checking depends on precision of 
these approximations. An interesting problem is that of finding conditions on 
the approximations of constraints and of S=, that guarantee optimality of next 
modalities. It turns out that optimality of the upper approximation of constraint 
FREEx^iji and of the lower approximation of constraint CONxj is sufficient, 
whenever SV(p) contains only general processes. 

Lemma 18. Let p he a closed process such that S'P(p) = {GP* ,T* 
where GP* = T* . Let || c ||“ and || c |p be optimal upper and lower approxima- 
tions for c € C. For each S'# G and K G {T,clV,dV}, a“(||< K >||‘^ 

( 7 (^#))) 2||< K >r (S#) and a'(||< K >||^ ( 7 (^#))) C||< K >||' (S#). 

In the upper approximation case this result is quite obvious, since by optimality 
p# G|| FREEa,iji 11“ implies the existence of an environment such that p G|| 
FREEa^i^ji II, namely the existence of a corresponding concrete transition. The 
lower approximation is optimal, whenever there exists a constrained transition 
if and only if for each concrete process there exists a corresponding concrete 
transition. Suppose that for each concrete process (T, p) A- (T',p'). For each p 
there exists a choice such that constraint Ci is satisfied and a non-deterministic 
choice such that p ^ = a. Optimality is guaranteed, since theorem^ ensures 

in addition that for each p all others alternatives cj are not true. Therefore, for 
each environment the constraint GONaj is indeed satisfied so that by optimality 
the corresponding abstract transition is certainly considered. 

In contrast, if SV{p) contains non-general processes lemma 2 is no longer 
valid. Problems arise from the abstract evaluation of parameters included in the 
definition of G=. It is sufficient to consider the case of p#, such that 3pi,p2 G 
7 e(p#) such that pi ^ FREEa^^j, and {Tij.,p 2 ) G= jiS*), but pi yf p 2 . 

Depending on the domain of values and expressions specific solutions must 
be studied for approximating constraints and G=. We suggest a general strat- 
egy. Safe dual approximations of constraints can be found on the basis of dual 
approximations of basic constraints be, e = e and e € V and by combining lower 
and upper approximations in the obvious way. 

Definition 19. Fore G C, let || -ic ||“= P(Anu#)\ || -ic ||*, || -ic |p= P(Anu#)\ 
II -'C ||“, II Cl A C2 ||'= nig{i,2} II Ci IP and || ci A C2 H“= nig{i,2} || c, ||“. 

If basic constraints are safely approximate these approximations are obviously 
safe. Unfortunately, they are in general non-optimal, since a“ does not preserve 
n, while a* does not preserve U. 

Moreover, we have to compute dual approximations of G=, that realizes pa- 
rameters evaluation. Let genf^Qp : Env"^ — >■ V{Env'^) be a safe approximation 
of genT,GP, where genT,Gp{p) € 'ieigcnl^ Qp{p"lb)) for each p G 7 e(p#). This 
function can be suitably used to define approximations of G=. 
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Definition 20. Let gen^ Qp be a safe approximation. We define 

1. (T, iff there exists pf G gen^ Qp{p^) sueh that {GP,pf) G 

(T,p*) gL S* iff for each pf G gen*^p{p*), {GP,pf) G S* . 

If geUp Qp is safe, then the previous approximations of G= are safe. 

The difficulties for computing optimal dual approximations are obvious. How- 
ever, it is important to stress the essential role of ® and © for limiting the loss of 
information the lower approximation. The relations © and © allows to improve 
the precision of the lower approximation with respect to the “trivial” definition, 
where a lower approximation of constraint FREEx,i,ji is considered. 

Example 21. Consider the process Ti x T 2 with Ti = a; > 0 V c!x.T,alx.T and 
T 2 = dy+l.T+a\y—l.T of example 3. With respect to the abstraction av{n) = •, 
for each n G Nat, we trivially obtain g|| c ||* for c = (a: > 0 D (r = r)) A (x < 
0 D (r = r)) and p^(x) = •, since t = t = true. Therefore, (Ti x T 2 ,p^) g||< 
r > true ||* is established. The lower approximation of constraint c captures 
that there are processes for both alternatives, but in both cases the action r 
can be performed. In other words, a constrained transition (Ti x T 2 ,p"^) 
is constructed. If we consider the trivial definition of lower approximation, we 
would not able to prove it, since neither p# g|| {x > 0 A (r = r)) ||* nor 
p^ G|| {x < 0 A (r = r)) \\f even if the evaluation of constraints is optimal. 
Thus, this trivial method succeeds only if the alternative choice is the same for 
each concrete processes. In contrast, due to © and © a weaker condition can be 
considered and a more precise result is achieved. The proposed method could 
not give the same results on a classical symbolic semantics, where branching 
represents both non-deterministic and alternative choices, since it exploits the 
existence of exactly one alternative choice for each process and the representation 
of all non-deterministic choices for each possibility. 



8 Related Works 



The combination of abstract interpretation and model checking has been the 
topic of intensive research in the last few years. Much of the work concerned the 
definition of safe abstract model, namely of safe abstract transition relations, 
that preserves universal properties mm and both universal and existen- 
tial properties j I I f mmm- mm propose the use of constrained and free 
transitions to handle branching modalities and pa tackles also the problem of 
effectively computing for very simple programs an abstract model. The proposed 
method suffers of the problems of the trivial definition shown in example 7. A 
slight different approach is the one of Kelb that investigates conditions for 
the safeness of abstract /r-calculus semantics instead that for the safeness of ab- 
stract models. In order to handle non-monotonic negation the combination of 
dual approximations is suggested. However, the problem of computing safe dual 
approximations of the next modality even over a given abstract model is not 
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addressed. In the framework of value-passing concurrent processes 0 proposes 
an abstract labelled transition system semantics for abstract closed processes 
obtained by an abstraction on the values domain. It is not obvious which class 
of temporal properties is preserved by the abstract model. In it has been in- 
troduced the idea of representing the relation of non-determinism and of alterna- 
tive choice among actions in order to compute more precise safe and constrained 
transition relations in the framework of closed abstract processes rather than 
in a symbolic approach with environments. Schmidt H21 shows a methodology 
for computing a finite approximate semantics of value-passing CCS by finitely 
approximating the semantics over abstract environments as a regular tree. Such 
a semantics is adequate for the verification of universal properties only. As far 
as concern the symbolic semantics, it is worth mentioning that other approaches 
have been proposed for representing regular processes by finite graphs m- 

9 Conclusions 

In this paper we have applied abstract interpretation to the verification of ^- 
calculus properties of value-passing concurrent processes. The main contribution 
is the definition of a finite symbolic semantics of processes, that differs from the 
classical one m in some aspects. First, classical branching of transitions is re- 
placed by explicit relations of alternative and non-deterministic choices among 
transitions. Moreover, infinite branches are avoided by representing current re- 
cursive calls by means of general processes. The concrete semantics of /i-calculus 
can suitably be computed by interpreting the symbolic graph over concrete en- 
vironments, but due to infinite values it is not effectively computable. We have 
proposed a technique to compute a lower approximate semantics on the symbolic 
graph by replacing concrete environments with abstract environments. Follow- 
ing the approach of for explicitly treating negation the lower approximate 
semantics has been obtained by combining lower and upper approximations for 
each operator of the logic except negation. The relations of non-deterministic and 
alternative choices turn out to be very useful to approximate the next modality. 
The lower approximation in particular results undoubtedly more precise than 
previous proposals [I IIIDj . 

With respect to the classical approach to abstract model checking the pro- 
posed method does not rely on the construction of a safe abstract model, but on 
the computation of safe approximations of the “model checking” functions over 
the symbolic graph. This approach has several advantages. In order to prove a 
property it would be typically necessary to subsequently refine the chosen value 
abstraction by adding more information. This way the construction of the new 
abstract model is avoided. Moreover, this approach fits well in the traditional 
abstract interpretation framework and allows us to reason about safety and pre- 
cision without introducing ad-hoc conditions as for instance the approximation 
ordering between abstract models of ginj. Recently, Schmidt UHl following the 
ideas of m has pointed out the very close connection among abstract model 
checking and data-flow analysis. These results suggest that the methods from 
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one area can be usefully used in the other. There are many directions to continue 
this research. For instance, it seems interesting to study whether the refinement 
operators |7| that have been designed to systematically construct new more pre- 
cise domain can be applied to the abstract model checking framework. 
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A Appendix 



Table 1. The concrete semantics 



T.p p clx.p p\v/x\ V € Val cle.p ^ p Sv{e) = v 



a / o, t O' / 

Pi Vi V\r^ Pi P2 P2 

H ; — ^ inti int 2 



Pi -b P2 A p' Pi X P2 A pi X P2 Pi X P2 !->■ Pi X P2 



Pi A pi P 2 A pi T[e/x] A p _ Pi A pi ^ 

■ sync rec P\x) = T Sb{be) — tt 



Pi X P 2 i-s- Pi X p 2 P{e) i-s- p 



6e V Pi I P2 A pi 



a f CL / 

P2 P2 P ^ P 

Sb{be) = // \ chan{a) n L = 0 



fee V Pi , P 2 A pi 



p\Le^p \ L 



The semantic rules for the symbolic semantics are based on some operations 
over actions. Let © ... © and 1 ?*^, = © . . . © 

for i G { 1 , 2 }, ji G { 1 ,©}, we define 

“ ^©02 = ici,ji^C2j2,®hie{Ki}0i,ji,hi®h2e{i,K2}&i,j2,h2®{*}) and = 

for ATi =_{hi G {lAi*} I ^i,ji,hi 7^ *}; 
~ ^ 1,31 = (be A © ... © and 02, 72 = hbe A 02,72,6*2,72,1 ® 

... © ^2,72,^72)? and 1^1,71 — ®/i, } A,7 i ,/ ij with \/ T\,T2 if 

Oi,ji,hi = * and = Tij.^hi otherwise; 

~ ^31,32 ~ A C2,72 , ©/ijg{i,fc7^} 0 i,ji,hi X ^2,72, ^12) and ^j^j2 ~ ®^i6{iA7jl 
X 7 ^ 2 , 72 .© ^here 




A Symbolic Semantics for Abstract Model Checking 151 



1- if G {T,de} and 02j2A2 = *> then ^hjiM ^ 

Tij,m = and (interleaving); 

2. if = c!x and 6*2j2A2 = * then x (^ 2 ,j 2 M ~ c?z, such 

that is free for \zjx\ and T2 is free for Moreover, T( = 

i^i.iiAi[^/a;] and = Ti (interleaving); 

3. if = 6'2,i2j2 = *. then 0imM ^ ^2j2./i2 = * and = Ti and 

'^ij2,h2 = ^2 (idle action); 

4. 0i,ji,hi = c!e and 02j^,h2 = c!x, then x 92,j^,h2 = t and = 

ifi.ii Ai and T!^ = 72j2,;i2 [e/a;] (synchronization); 

5. symmetric rules; 

- for 6>i = (cj,0i,i©. . .©6»i,„J and Qi = Ti,i©. . .©Ti_„., o) = (©, ©j,g/c,6»ijJ 
and Ki = {ji G {l,ni} | chan{6ijJ n A = 0 and q} = \ L. 
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(true,*) 

— >• ml 


a.T 


[Ti 





T 1 +T 2 



9i6{l,2},2ie{l,ni}®/i,22 



Table 2 . The symbolic semantics 
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■ V 
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Abstract. The interconnection structure of mobile systems is very dif- 
hcult to predict, since communication between component agents may 
carry information which dynamically changes that structure. In this pa- 
per we design an automatic analysis for statically determining all po- 
tential links between the agents of a mobile system specihed in the tt- 
calculus. For this purpose, we use a nonstandard semantics of the tt- 
calculus which allows us to describe precisely the linkage of agents. The 
analysis algorithm is then derived by abstract interpretation of this se- 
mantics. 

Key words: rr-calculus, nonstandard semantics, abstract interpretation. 



1 Introduction 

We are interested in analyzing the evolution of the interconnection structure, or 
communication topology, in a mobile system of processes, abstracting away all 
computational aspects but communication. Therefore, we can restrict our study 
to the TT-calculus nm, which is a widely accepted formalism for describing 
communication in mobile systems. Whereas the communication topology of sys- 
tems written in CSP m or CCS ca can be directly extracted from the text of 
the specification, a semantic analysis is required in the 7r-calculus, because com- 
munication links may be dynamically created between agents. In the absence of 
automatic analysis tools this makes the design and debugging of mobile systems 
very difficult tasks (see m for a detailed case study) . In this paper we propose 
a semantic analysis of the 7r-calculus based on Abstract Interpretation for 
automatically inferring approximate but sound descriptions of communication 
topologies in mobile systems. 

In a previous work m we have presented an analysis of the 7r-calculus which 
relies on a nonstandard concrete semantics. In that model recursively defined 
agents are identified by the sequence of replication unfoldings from which they 
stem, whereas the interconnection structure is given by an equivalence relation 
on the agent communication ports. That semantics is inspired of a representa- 
tion of sharing in recursive data structures PI which has been applied to alias 
analysis m- However, the equivalence relation does not capture an important 
piece of information for debugging and verification purposes: the instance of the 
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channel that establishes a link between two agents. In this paper we redesign 
our previous analysis in order to take this information into account, while still 
preserving a comparable level of accuracy. Surprisingly enough, whereas our orig- 
inal analysis was rather complicated, involving heavy operations like transitive 
closure of binary relations, the refined one is tremendously simpler and only 
requires very basic primitives. 

The paper is organized as follows. In Sect.Elwe introduce our representation 
of mobile systems in the 7r-calculus. Section 0| describes the nonstandard seman- 
tics of mobile systems, which makes instances of recursively defined agents and 
channels explicit. The abstract interpretation gathering information on commu- 
nication topologies is constructed in Sect.® In Sect. El we design a computable 
analysis which is able to infer accurate descriptions of unbounded and nonuni- 
form communication topologies. Related work is discussed in Sect. El 



2 Mobile Systems in the 7r-Calculus 



We consider the asynchronous version of the polyadic 7r-calculus which was in- 
troduced by Turner PH as a semantic basis of the PICT programming language. 
This restricted version has simpler communication primitives and a more op- 
erational flavour than the full 7r-calculus, while still ensuring a high expressive 
poweiQ. Let Af be a countable set of channel names. The syntax of processes is 
given by the following grammar: 



P ::= c![a;i, . . . ,a:„] 

I C?[xi,...,Xn]-P 
I *cl[xi,...,Xn]-P 

I (^1^) 

I {i>x)P 



Message 
Input guard 
Guarded replication 
Parallel composition 
Channel creation 



where c, x, a;i, . . . , are channel names. Input guard and channel creation act as 
name hinders^ i.e. in the process c7[xi, . . . ,x„].P (resp. {vx)P) the occurrences 
of x\,...,Xn (resp. x) in P are considered bound. Usual rules about scoping, 
a-conversion and substitution apply. We denote by fn(P) the set of free names 
of P, i.e. those names which are not in the scope of a binder. 

Following the CHAM style P , the standard semantics of the 7r-calculus is given 
by a structural congruence and a reduction relation on processes. The congruence 
relation “=” satisfies the following rules: 

(i) P = Q whenever P and Q are a-equivalent. 

(ii) P\Q = Q\P. 

(hi) P\{Q\R) = {P\Q)\R. 

(iv) {vx){vy)P ={vy){vx)P. 

(v) {vx)P I Q = {vx){P I Q) if X ^ fn(Q). 

The reduction relation is defined in Fig.E], where P{xi/yi, . . . ,Xn/yn\ denotes 
the resuit of substituting every name Xi for the name yi in P. This may involve 
a-conversion to avoid capturing one of the xfs. 



^ We can encode the lazy A-calculus for example p. 



